Benjamin Herrenschmidt wrote: > On Fri, 2011-07-01 at 18:03 +0800, tiejun.chen wrote: >> Here emulate_step() is called to emulate 'stwu'. Actually this is equivalent >> to >> 1> update pr_regs->gpr[1] = mem(old r1 + (-A)) >> 2> 'stw <old r1>, mem<(old r1 + (-A)) > >> >> You should notice the stack based on new r1 would be covered with mem<old r1 >> +(-A)>. So after this, the kernel exit from post_krpobe, something would be >> broken. This should depend on sizeof(-A). >> >> For kprobe show_interrupts, you can see pregs->nip is re-written violently so >> kernel issued. >> >> But sometimes we may only re-write some violate registers the kernel still >> alive. And so this is just why the kernel works well for some kprobed point >> after you change some kernel options/toolchains. >> >> If I'm correct its difficult to kprobe these stwu sp operation since the >> sizeof(-A) is undermined for the kernel. So we have to implement in-depend >> interrupt stack like PPC64. > > So I've spent a lot of time trying to find a better way to fix that bug > and I think I might have finally found one :-)
I can understand what you mean in below since I remember you already clarified this way previously. > > - When you try to emulate stwcx on the kernel stack (and only there), I think it should be stwu/stdu. > don't actually perform the store. Set a TIF flag instead to indicate > special processing in the exception return path and store the address to > update somewhere either in a free slot of the stack frame itself of > somewhere in the thread struct (the former would be easier). You may as > well do some sanity checking on the value while at it to catch errors > early. > > - In the exception return code, we already test for various TIF flags > (*** see comment below, it's even buggy today for preempt ***), so we > add a test for that flag and go to do_work. > > - At the end of do_work, we check for this TIF flag. If it's not set or > any other flag is set, move on as usual. However, if it's the only flag > still set: > > - Copy the exception frame we're about to unwind to just -below- the > new r1 value we want to write to. Then perform the write, and change > r1 to point to that copy of the frame. > > - Branch to restore: which will unwind everything from that copy of > the frame, and eventually set r1 to GPR(1) in the copy which contains > the new value of r1. We still can't restore this there. As you know this emulated store instruction can touch any filed inside pt_regs. Sometimes nip would be involved for this problematic location. And unfortunately, this is just that we meet currently. So we have to go exc_exit_restart. .globl exc_exit_restart exc_exit_restart: lwz r11,_NIP(r1) lwz r12,_MSR(r1) I mean we have to do that real restore here. So I'm really not sure if its a good way to add such a codes including check TIF/copy-get new r1/restore operation here since this is so deep for the exception return code. exc_exit_start: mtspr SPRN_SRR0,r11 mtspr SPRN_SRR1,r12 > > This is the less intrusive approach and should work just fine, it's also > more robust than anything I've been able to think of and the approach > would work for 32 and 64-bit similarily. > > (***) Above comment about a bug: If you look at entry_64.S version of > ret_from_except_lite you'll notice that in the !preempt case, after > we've checked MSR_PR we test for any TIF flag in _TIF_USER_WORK_MASK to > decide whether to go to do_work or not. However, in the preempt case, we > do a convoluted trick to test SIGPENDING only if PR was set and always > test NEED_RESCHED ... but we forget to test any other bit of > _TIF_USER_WORK_MASK !!! So that means that with preempt, we completely > fail to test for things like single step, syscall tracing, etc... > This is another problem we should address. > I think this should be fixed at the same time, by simplifying the code > by doing: > > - Test PR. If set, go to test_work_user, else continue (or the other > way around and call it test_work_kernel) > > - In test_work_user, always test for _TIF_USER_WORK_MASK to decide to > go to do_work, maybe call it do_user_work > > - In test_work_kernel, test for _TIF_KERNEL_WORK_MASK which is set to > our new flag along with NEED_RESCHED if preempt is enabled and branch to > do_kernel_work. > > do_user_work is basically the same as today's user_work > > do_kernel_work is basically the same as today preempt block with added > code to handle the new flag as described above. > > Is anybody volunteering for fixing that ? I don't have the bandwidth I always use one specific kprobe stack to fix this for BOOKE and work well in my local tree :) Do you remember my v3 patch? I think its possible to extend this for all PPC variants. Anyway, I'd like to be this volunteer with our last solution. Tiejun > right now, but if nobody shows up I suppose I'll have to eventually deal > with it myself :-) > > Cheers, > Ben. _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev