I had similar problems and some other people, too. All of them and me included have in 
common the aic7xxx controller and a SMP machine.
I fixed the timeout problem by enabling the PCI Bridge optimization in the kernel 
configuration.
But I receive filesystem errors under heavy I/O load - not by the kernel reported but 
by the application.
I am using two PII-400 and 1GB RAM and RAM seems to be ok (tested it).

Greetings, Dietmar

>----- Urspr�ngliche Nachricht -----
>Absender: [EMAIL PROTECTED]
>Betreff: Problem w/ SMP and aic7xxx
>Empf�nger: [EMAIL PROTECTED] , [EMAIL PROTECTED]
>Datum: 13. Apr 1999 09:52
>
> Problem:
> When running a kernel which supports SMP, I receive errors of scsi
> time-outs and resets under a load (it doesn't take much ... copying
> files or compiling while do the trick).  I enabled as much verbose flags
> as possible within aic7xxx and it seems as though scsi (some of which
> are completed) commands are dropped (see included messages below).
> 
> If I boot with a non-SMP kernel, I CANNOT reproduce the errors (maybe I
> can't generate enough traffic on the hard drives as with 2 CPU's
> compared to just one).  Hence, I suspect SMP, IO-APIC, and/or the
> aic7xxx driver.
> 
> I have tried multiple combinations of kernels from 2.0.36, 2.2.1, 2.2.3,
> and 2.2.5, with multiple compile options (i.e. PCI Bridging, MTRR, and
> anything that I found possibly related to the problem at hand).
> 
> Hardware:
>   HP Netserver LH Pro
>   128 Meg RAM (2 - 64 Meg DIMM)
>   2 - Pentium Pro 200's
>   2 - aic7880 on-board (PCI):  They share interrupt 11 and cannot be
> changed
>             to have unique interrupts for each (The EISA config utility
> promptly
>             configures both adapters to the same interrupt when either
> is changed).
> 
> NOTE:  I noticed that the 1st CPU has 512K cache while the 2nd CPU only
> has 256K cache.
> 
> Software:
>   Kernel 2.2.5
>   Raid 0.90
>   aic7xxx v5.13
> 
> Except for the cache difference on the two CPU's, I have eliminated
> hardware problems (or at least I think I have) via multiple tests of w/
> and w/o SMP, diagnostic utilities, removing and swapping DIMMs, and etc.
> 
> The following are a limited set of debug messages from the aic7xxx
> driver:
> Apr 10 18:40:11 lachesis kernel: scsi : aborting command due to timeout
> : pid 4274, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 48 00 e8 00
> 00 08 00
> Apr 10 18:40:11 lachesis kernel: (scsi0:0:1:0) Abort called for already
> completed command.
> Apr 10 18:40:11 lachesis kernel: scsi : aborting command due to timeout
> : pid 4275, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 49 04 90 00
> 00 08 00
> Apr 10 18:40:11 lachesis kernel: (scsi0:0:1:0) Aborting scb 10, flags
> 0x4
> Apr 10 18:40:11 lachesis kernel: (scsi0:0:1:0) SCB is currently active.
> Waiting on completion.
> Apr 10 18:40:11 lachesis kernel: scsi : aborting command due to timeout
> : pid 4277, scsi1, channel 0, id 4, lun 0 Write (10) 00 00 49 04 90 00
> 00 08 00
> Apr 10 18:40:11 lachesis kernel: (scsi1:0:4:0) Aborting scb 10, flags
> 0x6
> Apr 10 18:40:11 lachesis kernel: (scsi1:0:4:0) SCB found on waiting list
> and aborted.
> Apr 10 18:40:11 lachesis kernel: (scsi1:0:4:0) Aborting scb 10
> Apr 10 18:40:11 lachesis kernel: (scsi1:-1:-1:-1) 1 commands found and
> queued for completion.
> Apr 11 14:43:56 lachesis kernel: scsi : aborting command due to timeout
> : pid 13827, scsi1, channel 0, id 4, lun 0 Write (10) 00 00 30 00 10 00
> 00 08 00
> Apr 11 14:43:56 lachesis kernel: (scsi1:0:4:0) Aborting scb 11, flags
> 0x4
> Apr 11 14:43:56 lachesis kernel: (scsi1:0:4:0) SCB disconnected.
> Queueing Abort SCB.
> Apr 11 14:43:56 lachesis kernel: (scsi1:0:4:0) Abort message mailed.
> Apr 11 14:43:56 lachesis kernel: (scsi0:0:1:0) SCB 13 abort delivered.
> Apr 11 14:43:56 lachesis kernel: (scsi0:0:1:-1) Reset device, active_scb
> 2
> Apr 11 14:43:56 lachesis kernel: (scsi0:0:1:-1) Cleaning up status
> information and delayed_scbs.
> Apr 11 14:43:56 lachesis kernel: (scsi0:0:1:0:tag12) matches search
> criteria (scsi0:0:1:-1:tag255)
> Apr 11 14:43:56 lachesis kernel: (scsi0:0:1:0:tag9) matches search
> criteria (scsi0:0:1:-1:tag255)
> Apr 11 14:43:56 lachesis kernel: (scsi0:0:1:-1) Cleaning QINFIFO.
> Apr 11 14:43:56 lachesis kernel: (scsi0:0:1:-1) Cleaning waiting_scbs.
> 
> 
> 
> I have also received these two kernel killing messages:
> 
> end_scsi_request: buffer-list destroyed
> .
> .
> .
> Kernel panic: Inactive in scsi_request_queueable
> 

-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/mentre/smp-faq/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]

Reply via email to