Hello.  I recently constructed a workstation including a Diamond FirePort
40 SCSI card and a 9.1 GB IBM ultrastar drive.  I have begun to suspect
that the drive has major problems, but it's strange enough that I have to
ask some experts before I come to any conclusions.  I'm running Linux
2.2.10.  The problems usually happen within the first two hours of use,
though it varies.

There seem to be 2 sets of related errors, both of which are fatal.  In
one case, the kernel reports that a SCSI command was aborted due to a
timeout.  Sometimes the system will appear to go back to normal at this
point, while other times the timeout message will repeat, apparently
endlessly.  The system is completely hung at this point, and the load
increases because each process that tries to touch the disk is blocked.  

The second type of problem appears to involve a SCSI phase change (2-3;
what is a phase change?).  Once the phase change happens, weird stuff
happens to the disk.  All the logical partitions vanish and are replaced
by one giant partition with weird attributes.  The filesystems living on
these partitions are completely corrupted.  File permissions show up as
being completely strange, and attempts to execute binary files result in
various crashes (bus errors, seg faults, etc).  Attempts to write to
the filesystems yield "Attempt to write past end of device".  The primary
partitions remain fully intact and exhibit no problems at all.  The
filesystems go back to normal after a reboot.  This is the most strange
problem I've ever run accross.  It leads me to believe that there's a
problem with the sym53c8xx driver, except:

I borrowed a BusLogic card from work over the weekend, and tried using it
instead of the FirePort.  Things worked very well for most of the
weekend, with a couple of notable exceptions.  Suddenly, with no warning
at all, I got the same problems with the logical partitions getting munged
and a bunch of errors about attempting to access past the end of the
device.  There was no message regarding a phase change, but that might
simply be because the driver doesn't print such info.  I used the 'magic
SysRq key' to sync the disks, and powered down. A couple hours later, when
I tried using the system again, I got timeout errors on boot.  Because I
was using the BusLogic card and drivers, I got slightly different errors
(simply because the driver prints out different messages).  An opperation
would timeout, so the driver would attempt it again, at which point it
would timeout once again.  So the driver would try again (printing out a
message about 'trying harder').  This failed as well, for the same
reasons, and the driver printed a message about a probably unrecoverable
hang in either the SCSI host or a device.  This was before any filesystems
were mounted, so I simply power-cycled, and the system came up OK after
that.

I have tried a different cable at points in my debug prossess, so I doubt
that's the problem.  I believe I have the chain terminated properly, but
I'm pretty new to SCSI so I don't know that for a fact.  One end of the
cable is plugged into the SCSI card, the next slot is plugged into the
drive, there's an empty slot, then the terminator on the last slot.

The partition table things make the problem look like a software problem,
but I've used entirely different SCSI chipsets/drivers and seen the exact
same problem.  Additionally, we use the BusLogic cards at work without
troubles, and a friend has the FirePort 40, and has never experienced
problems.  I can't imagine it's a problem with a card, since I've tried
two.  So that leaves the drive.  It's an LVD drive, but it has a jumper to
switch to SE mode, which I've set.  It's only about 2 weeks old, but I've
been experiencing this problem the whole time I've had it.

Please get back to me if you can offer any info or additional debugging
tips for me.  This is an incredibly frustrating problem and it has
rendered my new workstation useless.  Any replies will be appreciated.

Thanks.
noah


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]

Reply via email to