at 9:29 PM, Andy Lutomirski <l...@kernel.org> wrote: >> On Oct 18, 2018, at 6:08 PM, Nadav Amit <na...@vmware.com> wrote: >> >> at 10:00 AM, Andy Lutomirski <l...@amacapital.net> wrote: >> >>>> On Oct 18, 2018, at 9:47 AM, Nadav Amit <na...@vmware.com> wrote: >>>> >>>> at 8:51 PM, Andy Lutomirski <l...@amacapital.net> wrote: >>>> >>>>>> On Wed, Oct 17, 2018 at 8:12 PM Nadav Amit <na...@vmware.com> wrote: >>>>>> at 6:22 PM, Andy Lutomirski <l...@amacapital.net> wrote: >>>>>> >>>>>>>> On Oct 17, 2018, at 5:54 PM, Nadav Amit <na...@vmware.com> wrote: >>>>>>>> >>>>>>>> It is sometimes beneficial to prevent preemption for very few >>>>>>>> instructions, or prevent preemption for some instructions that precede >>>>>>>> a branch (this latter case will be introduced in the next patches). >>>>>>>> >>>>>>>> To provide such functionality on x86-64, we use an empty REX-prefix >>>>>>>> (opcode 0x40) as an indication that preemption is disabled for the >>>>>>>> following instruction. >>>>>>> >>>>>>> Nifty! >>>>>>> >>>>>>> That being said, I think you have a few bugs. First, you can’t just >>>>>>> ignore >>>>>>> a rescheduling interrupt, as you introduce unbounded latency when this >>>>>>> happens — you’re effectively emulating preempt_enable_no_resched(), >>>>>>> which >>>>>>> is not a drop-in replacement for preempt_enable(). To fix this, you may >>>>>>> need to jump to a slow-path trampoline that calls schedule() at the end >>>>>>> or >>>>>>> consider rewinding one instruction instead. Or use TF, which is only a >>>>>>> little bit terrifying… >>>>>> >>>>>> Yes, I didn’t pay enough attention here. For my use-case, I think that >>>>>> the >>>>>> easiest solution would be to make synchronize_sched() ignore preemptions >>>>>> that happen while the prefix is detected. It would slightly change the >>>>>> meaning of the prefix. >>>> >>>> So thinking about it further, rewinding the instruction seems the easiest >>>> and most robust solution. I’ll do it. >>>> >>>>>>> You also aren’t accounting for the case where you get an exception that >>>>>>> is, in turn, preempted. >>>>>> >>>>>> Hmm.. Can you give me an example for such an exception in my use-case? I >>>>>> cannot think of an exception that might be preempted (assuming #BP, #MC >>>>>> cannot be preempted). >>>>> >>>>> Look for cond_local_irq_enable(). >>>> >>>> I looked at it. Yet, I still don’t see how exceptions might happen in my >>>> use-case, but having said that - this can be fixed too. >>> >>> I’m not totally certain there’s a case that matters. But it’s worth >>> checking >> >> I am still checking. But, I wanted to ask you whether the existing code is >> correct, since it seems to me that others do the same mistake I did, unless >> I don’t understand the code. >> >> Consider for example do_int3(), and see my inlined comments: >> >> dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code) >> { >> ... >> ist_enter(regs); // => preempt_disable() >> cond_local_irq_enable(regs); // => assume it enables IRQs >> >> ... >> // resched irq can be delivered here. It will not caused rescheduling >> // since preemption is disabled >> >> cond_local_irq_disable(regs); // => assume it disables IRQs >> ist_exit(regs); // => preempt_enable_no_resched() >> } >> >> At this point resched will not happen for unbounded length of time (unless >> there is another point when exiting the trap handler that checks if >> preemption should take place). > > I think it's only a bug in the cases where someone uses extable to fix > up an int3 (which would be nuts) or that we oops. But I should still > fix it. In the normal case where int3 was in user code, we'll miss > the reschedule in do_trap(), but we'll reschedule in > prepare_exit_to_usermode() -> exit_to_usermode_loop().
Thanks for your quick response, and sorry for bothering instead of dealing with it. Note that do_debug() does something similar to do_int3(). And then there is optimized_callback() that also uses preempt_enable_no_resched(). I think the original use was correct, but then a19b2e3d7839 ("kprobes/x86: Remove IRQ disabling from ftrace-based/optimized kprobes”) removed the IRQ disabling, while leaving preempt_enable_no_resched() . No?