On Mon, May 17, 2010 at 10:32:41PM +1000, Paul Mackerras wrote: > On Fri, May 14, 2010 at 12:25:31PM +0530, K.Prasad wrote: > > > Okay. I will re-use single_step_exception() after modifications; it > > appearsto have no in-kernel users for it. > > It's called from exceptions-64s.S, head_32.S and head_8xx.S in > arch/powerpc/kernel. > > > > Suppose the address at which the data breakpoint has been unmapped, > > > and the process has a handler for the SIGSEGV signal. When we try to > > > single-step the load or store, we will get a DSI (0x300) interrupt, > > > call into do_page_fault, and end up sending the process a SIGSEGV. > > > That will invoke the signal handler, which can then do anything it > > > likes. It can do a blocking system call, it can longjmp() back into > > > its main event, or it can return from the signal handler. Only in the > > > last case will it retry the load or store, and then only if the signal > > > handler hasn't modified the NIP value in the signal frame. That's > > > what I mean by "doesn't return to the instruction". > > > > > > > At the outset, this seemed to be a scary thing to happen; but turns out > > to be harmful only to the extent of generating a false hw-breakpoint > > exception in certain cases. A case-by-case basis analysis reveals thus: > > > > Consider an instruction stream i1, i2, i3, ... iN, where i1 has > > finished execution and i2 is about to be executed but has generated a > > DSI interrupt with the above-mentioned conditions i.e. DSISR indicates a > > DABR match + Page-Table entry not found. Now according to do_hash_page > > in exception-64s.S (as pasted below), do_page_fault() and do_dabr() are > > invoked one after the other. > > > > _STATIC(do_hash_page) > > std r3,_DAR(r1) > > std r4,_DSISR(r1) > > > > andis. r0,r4,0xa410 /* weird error? */ > > bne- handle_page_fault /* if not, try to insert a HPTE */ > > andis. r0,r4,dsisr_dabrma...@h > > bne- handle_dabr_fault > > Note that bne is not a procedure call; we'll actually get two DSIs in > this scenario. But I don't think that matters. Also note that the > branch to handle_page_fault here is not for the HPTE-not-found case; > it's for the unusual errors. So we'll handle the HPTE insertion after > handling the DABR match. >
Okay. > > Thus, when control returns to user-space to instruction 'i2', the > > hw_breakpoint_handler() has completed execution, and a SIGSEGV is pending > > to be delivered and single-stepping enabled MSR_SE is set. Upon return to > > user-space, the handler for SIGSEGV is executed and it may perform one of > > the following (as you stated previously): > > (a) Make a blocking syscall, eventually yielding the CPU to a new thread > > (b) Jump to a different instruction in user-space, say iN, and not complete > > the execution of instruction i2 at all. > > (c) Return to instruction i2 and complete the execution. > > > > In case of (a), the context-switches should not affect the ability to > > single-step the instruction when the thread is eventually scheduled to > > run. The thread, when scheduled onto the CPU will complete signal > > handling, return to execute instruction i2, cause single-step exception, > > restore breakpoints and run smoothly thereafter. > > Right. However, the thread is running the signal handler without the > DABR being set, which is unfortunate. > In order to keep the breakpoint active during signal handling, a PowerPC specific signal handling code, say do_signal_pending() in arch/powerpc/kernel/signal.c, must be tapped to check for/restore the breakpoint for the process (if it exists). Then, single_step_dabr_instruction() must be suitably modified to not clear the current->thread.last_hit_ubp pointer if breakpoint has been taken in a nested condition i.e. a breakpoint exception in signal-handler which was preceded by a previous breakpoint exception in normal user-space stack. A flag to denote such a condition would be required in 'struct thread_struct'. I'm afraid if this is more complexity than we want to handle in the first patchset. I agree that this will create a 'blind-spot' of code which cannot not be monitored using breakpoints and may limit debugging efforts (specially for memory corruption); however suspecting that signal handlers (especially those that don't return to original instruction) would be rare, I feel that this could be a 'feature' that can be brought later-in. What do you think? > > In case of (b), the new instruction iN is single-stepped, the breakpoint > > values are restored and the hw-breakpoint exception callback is invoked > > after iN is executed. The user of this breakpoint i.e. the caller of > > register_user_hw_breakpoint() who had placed a breakpoint on addressed > > accessed by instruction i2 will be confused to find that an unrelated > > instruction (which may not be a load/store) has caused the breakpoint. > > That's the case if the signal handler modifies the NIP value in the > register set saved on the stack and returns. If the signal handler > instead simply jumps to instruction iN (e.g. with longjmp or > siglongjmp), we'll never get the single-step callback. > True. As described above, this will lead to unreliable restoration of breakpoints. I'm not sure if this is a common occurrence in user-space, but if rare, I think we should be able to tolerate this behaviour for now. > > If so desired, we may adopt the 'trigger-before-execute' semantics for > > user-space breakpoints wherein the hw-breakpoint callback (through > > perf_bp_event()) is invoked in hw_breakpoint_handler() itself. This > > would indicate to the user that the impending instruction causes a DABR > > 'hit' but it may or may not be executed due to the role of > > signal-handler or due to self-modifying code (as mentioned below). > > > > Kindly let me know what you think about it. > > > > (c) is the normal execution path we desire. The instruction i2 will be > > safely single-stepped and breakpoints are restored. > > > > > The instruction could be changed underneath us if the program is > > > multi-threaded and another thread writes another instruction to the > > > instruction word where the load or store is. Or it could use mmap() > > > to map some other page at the address of the load or store. Either > > > way we could end up with a different instruction there. > > > > > > > If the instruction that originally caused the DABR exception is changed, > > the new instruction in its place would still single-step to restore > > breakpoint values. However the user of breakpoint interface will be > > confused to find that the callback is invoked for an irrelevant > > instruction. > > > > It could be circumvented, to an extent, through the use of > > trigger-before-execute semantics (as described before). > > I don't think we want to do trigger-before-execute. Ideally what we > want to ensure at all times is that either DABR is set (enabled) or > MSR.SE is set, but not both. To ensure that we'd have to modify the > signal delivery code and possibly other places. > The case where both DABR and MSR_SE are set is when a page-fault resulting in context-switch occurs before single-stepping, right? I have responded to that below. The problem with signal-handling, as I understand, is unreliable restoration of/lost breakpoints and I wish to deal them in future patchsets. Let me know if my understanding is incorrect. > > > If we do get a context switch, e.g. as a result of a page fault, and > > > then switch back to the task, it looks to me like we will end up with > > > MSR_SE and DABR both set. I don't suppose that will actually cause > > > any real problem besides double-counting the hit. > > > > > > > Page fault exception will be handled before hw_breakpoint_handler(), > > hence MSR_SE would not have been set if a context-switch happened in > > pange-fault handling itself. I don't see a case where both MSR_SE and > > DABR will be set together. > > Imagine this scenario: we get the DABR match, set MSR_SE and return to > the task. In the meantime another higher-priority task has become > runnable and our need_resched flag is set, so we call schedule() on > the way back out to usermode. The other task runs and then blocks and > our task gets scheduled again. As part of the context switch, > arch_install_hw_breakpoint() will get called and will set DABR. So > we'll return to usermode with both DABR and MSE_SE set. > I didn't foresee such a possibility. I think this can be handled by putting a check in arch_install_hw_breakpoint() as shown below: int arch_install_hw_breakpoint(struct perf_event *bp) { struct arch_hw_breakpoint *info = counter_arch_bp(bp); struct perf_event **slot = &__get_cpu_var(bp_per_reg); *slot = bp; if (!current->thread.last_hit_ubp) set_dabr(info->address | info->type | DABR_TRANSLATION); return 0; } This should prevent DABR and MSR_SE from being enabled at the same time. > > Thanks for the comments. Let me know if the analysis above is incorrect > > or if I've failed to recognise any important issue that you pointed out. > > I will send out a patch with changes to emulate_single_step() in the > > next version of the patchset, if I don't hear any further comments. > > We haven't discussed at all the case where the breakpoint is a per-cpu > breakpoint or where it's a per-task breakpoint but the DABR match > occurs within the kernel -- which can happen, even for a user address, > in __get_user, __put_user, __copy_tofrom_user, etc. If the access > there is to a bad address, we'll invoke the exception case in > bad_page_fault(), which looks to be another place where we need to > recognize that single-stepping won't succeed and reinstall the DABR > setting. Do we count that as an event or not? - I'm not sure. > I'll respond to this part in the subsequent mail. Thanks, K.Prasad _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev