Just thought I would drop a note and tell people where things stand.
 
    For the most part, things have been kind of quiet lately.  I am trying to polish off a few remaining issues.  I think I have gdth licked, and I think I have the host blocking feature added back again, and at this point I am waiting for the people testing.   In the background I am starting to clean up a few other things - I have split part of scsi.c into a new file (and in the long run, there will probably be several others as well).  Unfortunately scsi.c has served as sort of a repository for a bunch of loosely related things, and both readability and maintainability have suffered.  I am appending the current patchset, in the event that anyone else wants to look at it.
 
    I was talking with someone about how to implement automatic spinup of spun down drives, and I realized that there was a design flaw in the new queueing code.  Basically as things stand currently we pre-allocate Scsi_Cmnd structures based upon the queue depth for the device.  For a disk with a queue depth of 1 there will be only one Scsi_Cmnd structure dedicated to this device.  When a user does something which would cause an ioctl to be generated, the Scsi_Cmnd structure is allocated in the ioctl code, and then passed via scsi_do_cmd() or scsi_wait_cmd() into the mid-layer.  As it turns out, this will be inserted into the queue at the end.
 
    The queue handler function currently only looks at the head of the queue.  It keeps queueing things as long as resources are available to queue more commands.  One of the resources that gets used up of course is the supply of Scsi_Cmnd structures.  It is basically assumed that all of the structures that are allocated are essentially queued to the device, and if we run out, the queue handler can simply return without doing anything.  The idea is that when one of the commands that is currently running completes that we would make another stab at queueing the next request at the head of the queue.
 
    Consider a disk that is fairly active, and with a queue depth of 1.  Imagine that a user issues an ioctl - this will allocate the only Scsi_Cmnd structure, and drop it at the tail of the queue.  The scsi_request_fn() function will look at the head of the queue, and be unable to queue that request because there are no Scsi_Cmnd structures available, and return.  Hence a sort of deadlock will arise.
 
    My guess at the moment is that this situation isn't all that common - nobody has reported it yet, but it is only a matter of time before somebody stumbles across it.   The reason it hasn't come up is that it would only appear for a disk or a cdrom that is fairly busy, has a small queue depth, and for which an ioctl is issued.  The trick of course is to figure out how to fix it. 
 
    In the short term, I could easily add a simple hack to detect this situation and recover from it.  No big deal here, and until a more architecturally satisfying solution arises, this will do.
 
    In the longer term, there are several things that come to mind.  First of all, upper level drivers probably don't actually need a Scsi_Cmnd structure for anything except for completion processing.  As things currently stand, an upper level driver simply allocates the thing, and then calls scsi_do_cmd/scsi_wait_cmd without having done much to the Scsi_Cmnd structure at all.  Once the command is complete, then the upper level driver usually releases it after looking at the status code.  There are a handful of things that the upper level driver actually needs - usually just the completion code and in some cases the sense data in the event that something went wrong.  I could come up with something analagous to scsi_do_cmd or scsi_wait_cmd that doesn't even take a Scsi_Cmnd structure, but at this point I don't know to what degree it is possible to replace scsi_do_cmd/scsi_wait_cmd.  If a 100% replacement is feasible, then this would be a complete solution.
 
    The second thought that comes to mind is that we really need to get away from pre-allocating Scsi_Cmnd structures for devices.  There should be a general purpose pool allocated for the host instead (to prevent starvation, there might be one Scsi_Cmnd pre-allocated per device, and then the rest come from a general purpose pool, but that's just an idea).   The general idea is that managing the queue depth would be done at a different level, so that it is less likely that we get into the trouble I described above.  While we do have a known problem of overallocating Scsi_Cmnd structures,  I don't like the idea of allocating consumables if they are not needed right away, and this would be more of a band-aid.
 
    The last possibility is that for the case of things like ioctl it is *possible* that we could allocate an additional Scsi_Cmnd from the general kernel memory pool and hook it in.  In general you don't want to call kmalloc() anywhere that is in the code path during I/O operations that might take place during swapping, but ioctl generally won't fall in this category.  I am not that wild about this one.
 
    As things stand I am still thinking about it.  A quickie hack type of workaround certainly can be done to prevent any user-visible symptoms, but at this point I am trying to decide what would make the most sense from an architectural point of view.  I won't bother to code up the workaround unless people start to experience this problem in the field.   Unless I am hit with a brilliant flash of insight, I will probably play with the first possibility first and see how much trouble it gets me into.
 
-Eric
 

linux39a.diff

Reply via email to