Multiple responses to responses in 1 message, I hope no one gets confused...
First off, I did some testing with normal SCSI (no RAID) (yes, I do know
this is a RAID mailing list, but everyone is being so helpful I hope I'm
forgiven ;-). I wrote simultaneously to each of my 4 SCSI drives as fast as
I could (via cat /dev/zero > drive/file). Doing this produced no errors
and I stopped it after about 15 minutes (approx 500MB per disk). I did
receive this during the operation, though:
Feb 1 22:48:13 dual kernel: (scsi1:0:0:0) Performing Domain validation.
Feb 1 22:48:13 dual kernel: (scsi1:0:0:0) Successfully completed Domain
validation.
Feb 1 22:48:18 dual named[968]: Cleaned cache of 0 RRs
Feb 1 22:48:18 dual named[968]: USAGE 949412898 949358898 CPU=0.12u/0.1s
CHILDCPU=0u/0s
Feb 1 22:48:18 dual named[968]: NSTATS 949412898 949358898 A=7 MX=2 ANY=3
Feb 1 22:48:18 dual named[968]: XSTATS 949412898 949358898 RR=18 RNXD=1
RFwdR=13 RDupR=0 RFail=0 RFErr=0 RErr=0 RAXFR=0 RLame=0 ROpts=0 SSysQ=4
SAns=3 SFwdQ=6
SDupQ=13 SErr=0 RQ=12 RIQ=0 RFwdQ=0 RDupQ=4 RTCP=0 SFwdR=13 SFail=0 SFErr=0
SNaAns=3 SNXD=0
Is this normal? Should it just periodically appear?
Before when I got all my timeouts is when I was doing 2 things I haven't
tried again yet. One, I had all my SCSI drives being used a swap. Two, I was
trying to back up to my SCSI tape drive. I'll try these again later when I'm
ready for another crash ;-( As it is right now, everything seems stable,
albeit a bit slow.
Before I forget, I just wanted to say thanks to everyone's help. I really
appreciate it. Any further ideas would be gratefully accepted.
Now follows some responses to responses....
-----Original Message-----
From: Stephen Waters [mailto:[EMAIL PROTECTED]]
> or you could just configure the transfer rate to be one notch lower than
> your current level. had to do that with my 4 U2W drives in a hotswap box
> w/ a tekram dc390u2b (symbios chipset).
True but I did some benchmarking today using both "hdparm -tT" and another
simple test and I'm finding terrible throughput. My one drive that is on a
dedicated bus and should be able to do up to 40MB/s is only getting about 9
MB/s. An older SCSI drive (on a different bus) that should be able to do 5MB
is only getting 2MB. This is making me think something else is wrong.
I may try your suggestion anyway and see if reducing the speed helps at all.
Interestingly, the Linux says the 40MB/s drive is running at 32MB/s. Either
way it is far more than the measured 9 MB/s.
From: Peter Pregler [mailto:[EMAIL PROTECTED]]
> I had similar problems (actually your messages could be a cut-and-paste of
> my old logs) with my box at the beginning. The actual problems was that
> the scsi-bus did not fullfill the specifications. Replacing some hardware
> (hot-swap boxes) solved it. BTW, all worked well under DOS in the
> test-environment shipped. But as soon as linux got on the box and did
> _really_ use the bandwidth on the bus the troubles showed up (timeouts,
> renegotiation, slowdown ...).
Hmm, good info here. But I don't know how it helps. I know and acknowledge
that some of my drives are old/slow but shouldn't they still work without
errors? When you say you replaced "hot-swap boxes" you are talking about
drives, not SCSI cards, right?
From: Mike Black [mailto:[EMAIL PROTECTED]]
> Try turning off SYNC mode on ALL your drives in the SCSI BIOS. I had a
> similar problem this last weekend with 2.2.14 and 5.1.21 AIC-7xxx and
async
> mode fixed it. I had been previously running SYNC mode with no problems,
> but I added two new drives and couldn't get the mkraid to finish without
> hitting the same errors you're seeing. All my drives are now running
async
> and are happy. Slow, but happy. Previously (on 2.2.14 and prior) they
were
> running happy in sync mode but I was upgrading this weekend and had to do
a
> mkraid again (which really bangs the SCSI bus during resync). I tried
> several times with different sync/async combos with no joy. I thought
this
> was just my config on the one machine but, this morning I had a problem on
> another box which is a 3x50G 2940U2W setup. I accidentally left it NFS
> mounted and during the attempted backup it also hit some SCSI timeouts --
it
> recovered though. This box has been flawless for a long time (I think it
> must've been the network I/O causing this -- no proof yet -- just
> suspicion).
Oh-ho. That's an interesting discovery. I'm, of course, hesitant to redurce
my throughput, especially since it is so low to start out with (see above)
but that is better than timeouts, etc. I'll give it a shot.
--Rainer