Re: tagged queuing buggy drive?

Gerard Roudier Thu, 6 May 1999 19:50:37 -0700


On Thu, 6 May 1999, Kenneth D. Merry wrote:

> Well, the major bug in LXY4 isn't really related to tagged queueing
> specifically.  It's that the drive locks up under load.  But it's easiest
> to reach that sort of load by enabling tagged queueing.

Hmmm ...
You should be right, but very partially, in my opinion, since you omit
something important, I think.

Some firmwares, and notably the Atlas' ones, seem to complete immediately 
SCSI transactions when TGQ is used and they are lacking of some resources.
Such a situation can be avoided by not enabling write caching and using 
a reasonnable number of tags (32 for the ATlas II works).

Such a behaviour makes the old Atlas I L912 return QUEUE FULL with no more 
that ZERO command disconnected for example, when it is heavily stressed 
with write caching enabled.
I never got that with the ATlas II LXY4, but I got some QUEUE FULL with no 
more than 2 disconnect commands. When write caching is disabled, the 
device does not experience those weirdnesses.

By the way, under Linux the write caching does not help a lot for
performances since everything is cached by the kernel. This is probably
very different with O/Ses that do synchronous IOs, as seen by the kernel,
for meta-data. Note that the disk write caching is just doing the
asynchronous IO that the kernel has been too fearfull to do by itself. ;-) 

> Well, the main bug in LYK8 is the same bug that most every high-end Quantum
> drive has nowadays.  It continually returns Queue Full under high load, even
> when there are very few transactions queued to the drive.

This is not a bug, in my opinion and conforms to SCSI specifications.

The only thing that is questionnable is to return QUEUE FULL with ZERO 
command disconnected. Such a situation is very painfull to handle from 
the kernel/driver. It is kind of 'full and empty condition' that cannot be 
handled without wasting either time or SCSI BUS bandwitch.
Anyway, it is still compliant with SCSI specifications.

> We "fixed" it in FreeBSD by setting a lower bound on the number of
> transaction slots (24) for Atlas II and III drives.  Without that, we
> would automatically reduce the number of slots (or queue depth) to the
> system-wide minimum for devices with tagged queueing enabled (2
> transactions).

The ncr53c8xx + sym53c8xx driver of Linux (that started from the ncr
driver, as you probably know) are handling this situation.

In order to handle intelligently and dynamically the device (logical unit)
CCB queue depth, we must have knowledge of the _actual_ number of
disconnected CCB when the QUEUE FULL occurs. When a QUEUE FULL occurs, we
must immediately reduce (temporalily) the device queue depth to something
not greater than the number of the disconnected CCBs at the time the QUEUE
FULL has been returned, or wait for a completion before _actually_ 
queuing to the a new command to the device.

On the other hand, the _order_ of commands must be preserved.

I donnot know about the FreeBSD heuristic, but if it is not based on what 
I just mentionned, it should be not that smart.
May-be you would want to tell us how it works.

> I think you would certainly be better off upgrading to LYK8, since it won't
> hang like LXY4 does.

Indeed.

G�rard.


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
Re: tagged queuing buggy drive?

Reply via email to