On 9/21/22 00:17, Christophe Leroy wrote: > Le 21/09/2022 à 05:33, Samuel Holland a écrit : >> On 9/19/22 07:37, Michael Ellerman wrote: >>> Christophe Leroy <christophe.le...@csgroup.eu> writes: >>>> Le 16/09/2022 à 07:05, Samuel Holland a écrit : >>>>> With CONFIG_PREEMPT=y (involuntary preemption enabled), it is possible >>>>> to switch away from a task inside copy_{from,to}_user. This left the CPU >>>>> with userspace access enabled until after the next IRQ or privilege >>>>> level switch, when AMR/IAMR got reset to AMR_KU[AE]P_BLOCKED. Then, when >>>>> switching back to the original task, the userspace access would fault: >>>> >>>> This is not supposed to happen. You never switch away from a task >>>> magically. Task switch will always happen in an interrupt, that means >>>> copy_{from,to}_user() get interrupted. >>> >>> Unfortunately this isn't true when CONFIG_PREEMPT=y. >>> >>> We can switch away without an interrupt via: >>> __copy_tofrom_user() >>> -> __copy_tofrom_user_power7() >>> -> exit_vmx_usercopy() >>> -> preempt_enable() >>> -> __preempt_schedule() >>> -> preempt_schedule() >>> -> preempt_schedule_common() >>> -> __schedule() >>> >>> I do some boot tests with CONFIG_PREEMPT=y, but I realise now those are >>> all on Power8, which is a bit of an oversight on my part. >>> >>> And clearly no one else tests it, until now :) >>> >>> I think the root of our problem is that our KUAP lock/unlock is at too >>> high a level, ie. we do it in C around the low-level copy to/from. >>> >>> eg: >>> >>> static inline unsigned long >>> raw_copy_to_user(void __user *to, const void *from, unsigned long n) >>> { >>> unsigned long ret; >>> >>> allow_write_to_user(to, n); >>> ret = __copy_tofrom_user(to, (__force const void __user *)from, n); >>> prevent_write_to_user(to, n); >>> return ret; >>> } >>> >>> There's a reason we did that, which is that we have various different >>> KUAP methods on different platforms, not a simple instruction like other >>> arches. >>> >>> But that means we have that exit_vmx_usercopy() being called deep in the >>> guts of __copy_tofrom_user(), with KUAP disabled, and then we call into >>> the preempt machinery and eventually schedule. >>> >>> I don't see an easy way to fix that "properly", it would be a big change >>> to all platforms to push the KUAP save/restore down into the low level >>> asm code. >>> >>> But I think the patch below does fix it, although it abuses things a >>> little. Namely it only works because the 64s KUAP code can handle a >>> double call to prevent, and doesn't need the addresses or size for the >>> allow. >>> >>> Still I think it might be our best option for an easy fix. >>> >>> Samuel, can you try this on your system and check it works for you? >> >> It looks like your patch works. Thanks for the correct fix! > > Instead of the patch from Michael, could you try by replacing > preempt_enable() by preempt_enable_no_resched() in exit_vmx_usercopy() ?
I finally got a chance to test this, and the simpler fix of using preempt_enable_no_resched() works as well. >> I replaced my patch with the one below, and enabled >> CONFIG_PPC_KUAP_DEBUG=y, and I was able to do several kernel builds >> without any crashes or splats in dmesg. > > Did you try CONFIG_PPC_KUAP_DEBUG without the patch ? Did it detect any > problem ? I believe I did at one point, and it did not detect anything. Regards, Samuel