Hi all, I have finally been able to fix the Linux kernel crash that occurs on the Sparc target (sun4m) when doing intensive disk I/O (see the dmesg log below).
slavio_pic_set_irq() in slavio_intctl.c calls slavio_check_interrupts() when an interrupt is activated, but also when interrupt is deactivated. This can cause in very rare conditions a spurious interrupt that perturbates the ESP driver that leads to a kernel crash. >From what I have been able to trace, it occurs when an interrupt is being serviced, and an interrupt with a lower level is being cleared before the interrupt routine in the target disables the first interrupt. To have a bad effect on the ESP driver, it should also occur when a DMA transfer is scheduled. That explains why this bug is not so easy to reproduce (it usually occurs between half an hour and two hours under intensive disk I/O, and up to 24 hours with very few disk I/O), though it is very annoying. Note that all other functions from this file that activate and deactivate interrupts only call slavio_check_interrupts() in interrupt activation cases, so they are already correct. The patch below fixes the problem. With this patch I am currently running a Sparc target with intensive I/O disk for 24 hours without crash. Cheers, Aurelien esp0: !BSERV after data, probably to msgout esp0: Aborting command esp0: dumping state esp0: dma -- cond_reg<a4000211> addr<f0251000> esp0: SW [sreg<00> sstep<04> ireg<18>] esp0: HW reread [sreg<83> sstep<00> ireg<10>] esp0: current command [tgt<00> lun<00> pphase<DATAOUT> cphase<DATAOUT>] esp0: disconnected esp0: Aborting command esp0: dumping state esp0: dma -- cond_reg<a4000210> addr<f0251000> esp0: SW [sreg<00> sstep<04> ireg<18>] esp0: HW reread [sreg<03> sstep<00> ireg<10>] esp0: current command [tgt<00> lun<00> pphase<UNISSUED> cphase<UNISSUED>] esp0: disconnected esp0: Resetting scsi bus esp0: SCSI bus reset interrupt Unable to handle kernel NULL pointer dereference tsk->{mm,active_mm}->context = 0000000d tsk->{mm,active_mm}->pgd = fc048800 \|/ ____ \|/ "@'/ ,. \`@" /_| \__/ |_\ \__U_/ apt-get(4250): Oops [#1] PSR: 04400fc6 PC: fe61f128 NPC: fe61f12c Y: 00000000 Not tainted PC: <esp_do_data_finale+0x3b4/0x3f8 [esp]> %G: f2cb4000 ffffffff 00000014 fd0da000 00000000 00000020 f2cb4000 00000001 %O: fe620800 f79d8800 00000010 00000008 f00d8eac f0234000 f2cb5b18 fe61edd0 RPC: <esp_do_data_finale+0x5c/0x3f8 [esp]> %L: f79f3600 00000000 00000000 f7956500 00000000 ea7afb00 f3004000 00989680 %I: f021529c 00000000 00000000 00000000 00000000 fff00000 f2cb5b80 fe61de10 Caller[fe61de10]: esp_work_bus+0x64/0x6c [esp] Caller[fe61f7e8]: esp_intr+0x1e0/0x310 [esp] Caller[f0013160]: handler_irq+0x94/0xd4 Caller[f0010bd8]: patch_handler_irq+0x8/0x24 Caller[f019b744]: here+0x18/0x90 Caller[f019c538]: do_nanosleep+0x44/0x88 Caller[f0046af8]: hrtimer_nanosleep+0x30/0x130 Caller[f0046c74]: sys_nanosleep+0x7c/0x94 Caller[f0011634]: syscall_is_too_hard+0x3c/0x40 Caller[5035a36c]: 0x5035a374 Instruction DUMP: c22420ec 8400a014 c42420e8 <c200a010> c22420e4 c200a00c c22420e0 c20e203a 82086007 Kernel panic - not syncing: Aiee, killing interrupt handler! <0>Press Stop-A (L1-A) to return to the boot prom --- hw/slavio_intctl.c 2007-02-06 00:01:54.000000000 +0100 +++ hw/slavio_intctl.c 2007-03-14 13:50:18.000000000 +0100 @@ -293,6 +293,7 @@ if (level) { s->intregm_pending |= mask; s->intreg_pending[s->target_cpu] |= 1 << pil; + slavio_check_interrupts(s); } else { s->intregm_pending &= ~mask; @@ -300,7 +301,6 @@ } } } - slavio_check_interrupts(s); } void slavio_pic_set_irq_cpu(void *opaque, int irq, int level, unsigned int cpu) -- .''`. Aurelien Jarno | GPG: 1024D/F1BCDB73 : :' : Debian developer | Electrical Engineer `. `' [EMAIL PROTECTED] | [EMAIL PROTECTED] `- people.debian.org/~aurel32 | www.aurel32.net _______________________________________________ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel