On Thu, May 17, 2007 at 10:02:45PM -0500, Robin Holt wrote: > On Thu, May 17, 2007 at 08:16:55AM -0600, David Mosberger-Tang wrote: > > On 5/17/07, Keith Owens <[EMAIL PROTECTED]> wrote: > > > > >David Mosberger > > >reckons that unwind should never cause an error, maybe we should be > > >looking at adding more checks to the unwind code to cope with spurious > > >addresses? > > > > That's correct. If the unwinder causes MCAs, it's broken. Robin, can > > you look into why the memory-access safety-checks in the unwinder > > aren't sufficient to avoid the MCAs you're seeing? > > I don't think it got very far at all. > > The task in question is calling get_wchan on itself. It is at > >> px *(task_struct *)0xe003819a00000000 | grep ksp > ksp = 0xe003819a00007900 > >> px 0xe003819a00007900 + 16 > 0xe003819a00007910 > >> px *(switch_stack *)0xe003819a00007910 | grep bsp > ar_bspstore = 0xe003819a00000000 > > > Here we start to run into difficulties. ar_bspstore is the same address > as our task_struct. info->regstk.top == 0xe003819a00000000 which leads > to unw_init_frame_info calculating info->bsp == 0xe0038199ffffff30 > which is near the addresses causing problems (0xe0038199ffffff80 and > 0xe0038199ffffffe0). Notice it is in the page before our task_struct.
I think I have everything figured out now. Address range for our tasks switch stack is 0xe001849a00007910 to 0xe001849a00007b20. Or unw_frame_info structure allocated by get_wchan() on the memory stack happens to reside at 0xe001849a00007b20 to 0xe001849a00007ce8. Assume we are in get_wchan and r12 == 0xe001849a00007b20. We take an interrupt. The switch stack gets allocated on the memory stack in the address ranges above. Upon return from the interrupt, we proceed to call unw_init_from_blocked_task() which called unw_init_frame_info(). unw_init_frame_info does: 0xa000000100041aa0 <unw_init_frame_info>: [MMI] alloc r36=ar.pfs,9,6,0 0xa000000100041aa6 <unw_init_frame_info+0x6>: adds r12=-16,r12 0xa000000100041aac <unw_init_frame_info+0xc>: mov r35=b0 0xa000000100041ab0 <unw_init_frame_info+0x10>: [MII] nop.m 0x0 0xa000000100041ab6 <unw_init_frame_info+0x16>: mov r38=r32;; 0xa000000100041abc <unw_init_frame_info+0x1c>: adds r9=16,r12 0xa000000100041ac0 <unw_init_frame_info+0x20>: [MMI] mov r39=r0 0xa000000100041ac6 <unw_init_frame_info+0x26>: nop.m 0x0 0xa000000100041acc <unw_init_frame_info+0x2c>: mov r40=456;; 0xa000000100041ad0 <unw_init_frame_info+0x30>: [MIB] st8 [r9]=r33 which ends up placing r33 (struct task_struct *) onto the stack at exactly the location of the no longer valid switch_stack struct pointed to by this threads ->ksp. This comes down to we need to take an interrupt in get_wchan when called on our own task between the time when r12 is updated to allocate the unw_frame_info structure and when unw_init_from_blocked_task() is called. Seeing how that is only a few instructions, I would expect this to be a fairly small window of opportunity. I am going to submit two patches. One which improves the error checking in the unwind functions. The other is essentially the patch I produced yesterday. Thanks, Robin - To unsubscribe from this list: send the line "unsubscribe linux-ia64" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
