Jeff Royle wrote:
LI Xin wrote:
Jeff Royle wrote:
Jeff Royle wrote:
I could use some advice on this issue I have had with my raid
controller.
I am not really running much on the system yet, postfix, Pf + pflogd,
rlogind, ssh, bsnmp and ntpd. While I was just reading a file with
less the system stopped responding. I thought it was the network
interfaces but I was able to ping the interface. Once I plugged a
monitor into the system I saw this (roughly):
AAC0: COMMAND <SOME HEX> TIMEOUT AFTER X number of seconds
Not good :)
Reset of the system resolved the issue and it booted fine. Since
the controller stopped responding nothing was recorded to my logs.
Now I have to figure out how to prevent that from happening again.
Basic run down on the system and some history...
P4 3.2Ghz
Asus P5MT-S MB
2 x 1GB DDR2 667 memory
Adaptec 2130SLP Raid Controller + battery backup module
2 Segate Ultra320 73GB 15k RPM (mirrored)
I have run this same system hardware testing 6.2-BETA3, RC-1 and RC-2
without this issue. I was using the driver released by Adaptec
while testing the pre-release installs
(http://www.adaptec.com/en-US/speed/raid/aac/unix/aacraid_freebsd6_drv_b11518_tgz.htm).
You could say I am fairly confidient in the hardware itself. I have
put this system through a lot of testing since BETA3.
The 6.2 release kernel has not been customized all that much, I just
pulled out all the drivers I would never use. To be safe I kept
just about all scsi devices/card models still in as I continued my
testing of 6.2 release. Right now I am going to try taking out aac and
aacp then try the driver I used in my previous tests. However,
since I have run a week without this issue it will be hard/impossible
tell if this did anything to resolve it...I almost want a crash on the
old driver :)
So I need some advice... How best do I debug this issue?
Thanks in advance for any direction you guys can offer me.
Cheers,
Jeff
It appears the driver I was using in my pre-release testing is newer
then the release driver.
Stock driver in 6.2r dmesg:
aac0: <Adaptec SCSI RAID 2130S> mem
0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2
aac0: New comm. interface enabled
aac0: Adaptec Raid Controller 2.0.0-1
aacp0: <SCSI Passthrough Bus> on aac0
Currently using:
aacu0: <Adaptec SCSI RAID 2130S> mem
0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2
aacu0: New comm. interface enabled
aacu0: Adaptec Raid Controller 2.0.7-1
aacpu0: <SCSI Passthrough Bus> on aacu0
Going to continue testing with the newer driver.
I have some preliminary work on merging the Adaptec driver:
http://people.freebsd.org/~delphij/for_review/patch-aac-vendor-b11518
But one of the reviewers has advised me to request boarder testing,
especially against old cards and CLI tools, so I have hold the commit
for now.
Cheers,
Well the driver patched fine, no issues to report there.
The speed performance is where I expected to see it while using bonnie
and simple DD tests based on my previous testing.
So far the issue I noted above with the TIMEOUT error has not shown
itself again, time will tell I think on this one.
However I have encountered a intermittent bug on boot.
Sometimes, say every 5-10 boots the system will hang while probing the
the scsi bus for the drives. Now I have seen this happen on the aacdu
2.0.7-1 binary driver I was using in my 6.2-RC 1 / 6.2-RC 2 testing once
before. This problem is happening a fair bit more.
Here is where it hangs...
Hung dmesg output:
-- snip ---
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcd7ff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: parallel port not found.
Timecounters tick every 1.000 msec
acd0: CDRW <QSI CD-RW/DVD-ROM SBW-243/TX09> at ata0-master UDMA33
aacd0: <RAID 1 (Mirror)> on aac0
aacd0: 69889MB (143132672 sectors)
--- end snip ---
The system does not continue on and probe the drives, as seen in a
normal boot dmesg:
--- snip ---
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: parallel port not found.
Timecounters tick every 1.000 msec
acd0: CDRW <QSI CD-RW/DVD-ROM SBW-243/TX09> at ata0-master UDMA33
aacd0: <RAID 1 (Mirror)> on aac0
aacd0: 69889MB (143132672 sectors)
pass0 at aacp0 bus 0 target 0 lun 0
pass0: <SEAGATE ST373207LC 0005> Fixed unknown SCSI-3 device
pass0: 3.300MB/s transfers
pass1 at aacp0 bus 0 target 3 lun 0
pass1: <SEAGATE ST373207LC 0005> Fixed unknown SCSI-3 device
pass1: 3.300MB/s transfers
SMP: AP CPU #1 Launched!
Trying to mount root from ufs:/dev/aacd0s1a
-- end snip --
In a effort to resolve this I increased the scsi delay in the kernel
from 5ms to 10ms
options SCSI_DELAY=10000
It *may* have helped on one of my reboot tests, I thought it was going
to hang again but proceeded. However it definitely did not solve the
issue.
Once I am back in the office I will see if I can get some debug output
for you.
Cheers,
Jeff
Update ---
The TIMEOUT error I think has been resolved using aac 2.0.7-1 patch.
The system has never failed on any of my tests to generate the timeout.
However the hardlock on boot while probing the hard drives continues.
From another post someone suggested disabling the device fdc as there
is a bug in the Intel chipset that can cause issues. So I attempted
that as I have seen the floppy seek an unusually long time. No change.
I am assuming at this point this bug is not specific to the aac driver
since I saw it at least once on the binary 2.0.7-1 driver from Adaptec.
Last reboot test results was : Reboot #10 hardlock
Unfortunately it will not break into the debugger to get more detailed
information.
Last time I am going to try was a recent post suggesting
hint.apic.0.disabled=1 might help. This was to resolve another boot
issue, not exactly the same issue I have but I am willing to try almost
anything at this point.
I admit I don't really understand what exactly hint.apic.0.disabled
does. My assumption is it disables all APIC, we shall have to see :)
Cheers,
Jeff
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"