[BUG] (alpha) kernel thread panics due to stale PTBR settings in 2.3.47

Dave Anderson Fri, 25 Feb 2000 12:52:53 -0800

Hello,

Sorry for the wide distribution -- I'm not sure who this should be directed
to...

We had been seeing panics in the alpha 2.3.41 stream where a kernel thread,
typically one of the nfsd daemons or kswapd, fault on the swap_info swap_map
address, which is a mapped (vmalloc'd) address. The problem was due to
the disconnect between the active_mm pgd value and what's actually stored
in the kernel task's ptbr value -- which is what gets loaded into the PTBR
register with each alpha context switch. Eventually kernel tasks will find
that the physical address stored in their thread_struct's ptbr become stale,
as the page that they reference is freed and re-used elsewhere.

I note that in 2.3.47, the problem looked to have been addressed by
the addition of the enter_lazy_tlb() call in schedule():

        if (!mm) {
                if (next->active_mm) BUG();
                next->active_mm = oldmm;
                atomic_inc(&oldmm->mm_count);
+++           enter_lazy_tlb(oldmm, next, this_cpu);
        }

Unfortunately the alpha enter_lazy_tlb() doesn't do anything:

static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk, unsigned cpu)
{
}

If this is still a work in progress, excuse my interruption, but if not,
the alpha enter_lazy_tlb() should update the kernel task's ptbr with the
oldmm's pgd. Right?

If you're interested in the details, here's the evidence from a 2.3.47 crash
dump, in which kswapd panicked trying to reference a swap_map address at
fffffe0000000032:

crash> bt
PID: 2 TASK: fffffc001fd64000 CPU: 0 COMMAND: "kswapd"
#0 [fffffc001fd67ad0] crash_save_current_state at fffffc0000336ffc
#1 [fffffc001fd67ae0] panic at fffffc00003271f8
#2 [fffffc001fd67b80] die_if_kernel at fffffc00003113d0
#3 [fffffc001fd67bb0] do_page_fault at fffffc000031fecc
#4 [fffffc001fd67bf0] entMM at fffffc000031055c
EFRAME: fffffc001fd67c28      R24: 0000000000000cec
     R0: 0000000000000001      R25: 0000000000000007
     R1: fffffe0000000032      R26: fffffc0000350aec <__delete_from_swap_cache+0x8c>
     R2: 0000000000000003      R27: fffffc00003514c0
     R3: 0000190000000000      R28: 0000000000000000
     R4: fffffc000052d888      HAE: 0000000000000000
     R5: 0000000000000200 TRAP_A0: fffffe0000000032
     R6: fffffc00006329d0 TRAP_A1: 0000000000000001
     R7: fffffc001fd67dc0 TRAP_A2: 0000000000000000
     R8: fffffc001fd64000       PS: 0000000000000000
    R19: 0000000000000400       PC: fffffc0000351544 <__swap_free+0x84>
    R20: fffffc00005317c0       GP: fffffc0000554030
    R21: 0000000000000000      R16: 0000190000000000
    R22: 0000000000000006      R17: 0000000000000001
    R23: fffffc0000345244      R18: 0000000000000059
#5 [fffffc001fd67d10] __swap_free at fffffc0000351544
#6 [fffffc001fd67d50] __delete_from_swap_cache at fffffc0000350aec
#7 [fffffc001fd67d60] shrink_mmap at fffffc0000345460
#8 [fffffc001fd67de0] do_try_to_free_pages at fffffc000034f87c
#9 [fffffc001fd67e20] kswapd at fffffc000034fa2c
#10 [fffffc001fd67e60] kernel_thread at fffffc00003107f0

In the case above, the kswapd's ptbr references physical address
5bd8000, which has long since been freed and re-assigned to the
kmem slab area:

crash> task fffffc001fd64000 | grep ptbr
    ptbr = 0x2dec,
crash> ptob 0x2dec
2dec: 5bd8000
crash> kmem -p 5bd8000
      PAGE       PHYSICAL       MAPPING      INDEX CNT FLAGS
fffffc0000c212e0   5bd8000 0000000000000000    106 1 uptodate,slab

At the same time as the panic above, the 8 nfsd daemons and the two
idle tasks *all* contained ptbr values referencing physical addresses that
had been freed and re-used.

Thanks,
Dave Anderson
[EMAIL PROTECTED]

[BUG] (alpha) kernel thread panics due to stale PTBR settings in 2.3.47

Reply via email to