On 19-Mar-01 Dag-Erling Smorgrav wrote:
> SMP box with a bleeding-edge -CURRENT kernel, patched to avoid the
> i586_bzero() problem:
> panic: mutex_enter: recursion on non-recursive mutex process lock @
> ../../i386/i386/trap.c:854
> cpuid = 1; lapic.id = 01000000
> Debugger("panic")

That's a later symptom of a problem.  We recursed on the proc lock doing the
PHOLD before we handled the page fault.
> CPU1 stopping CPUs: 0x00000001... stopped.
> Stopped at      Debugger+0x45:  pushl   %ebx
> db> show mutex
>         "panic" (0xc030b1e0) locked at ../../kern/kern_shutdown.c:544
>         "process lock" (0xd3f15000) locked at ../../i386/i386/machdep.c:625

This is in sendsig():

        p = curproc;
        psp = p->p_sigacts;
        if (SIGISMEMBER(psp->ps_osigset, sig)) {

>         "Giant" (0xc0309ac0) locked at ../../i386/i386/trap.c:1169
> db> trace
> Debugger(c027d5e1) at Debugger+0x45
> panic(c027c420,c027a154,c02997d0,356,d3f14ee0) at panic+0x144
> witness_enter(d3f15000,0,c02997d0,356) at witness_enter+0x355
> trap_pfault(d7345d4c,0,0) at trap_pfault+0x143
> trap(18,10,10,d7345fa8,0) at trap+0x978
> calltrap() at calltrap+0x5
> --- trap 0xc, eip = 0, esp = 0xd7345d8c, ebp = 0xd7345ed8 ---
> (null)(805c3e0,e,d7345f10,0,4) at 0
> postsig(e) at postsig+0x40b

Hmmm.  An eip of 0 is bad.  This could be just another instance of the bzero
bug just in another place.  You probably want to change the code that actually
sets *bzero to i586_bzero (and same for any other ops that use floating point).
The code in question for this lies in i386/isa/npx.c.  It seems we use the fp
regs for copyin/copyout and bcopy as well.  I would just change line 458 of
npx.c to say '#ifdef I586_CPU_XXX' for now as your temporary patch (then you
don't need to patch pmap_zero_page() anymore.)


