Heh ... we'll find ways to get better performance from usb-storage
for sure!  :)


> All storage device transfers can be looked at in terms of 3 phases:
> command, (optional) data, and status.
> 
> All service options require an URB for the command, either to a control or
> bulk endpoint.
> Status is passed either via bulk URB or interrupt URB, depending on service
> options. 

So in many typical cases, multiple endpoints are involved.


> Data is always passed via bulk endpoint.  The SCSI layer will allocate the
> scatter-gather segments for me, and those can vary significantly.  Initial
> performance tests suggest that memory allocation is _much_ faster (read 4-6
> times) when smaller segments are used.  That stat is based on teh current
> codebase -- when the number of segments is increased (and thus the size of
> each decreases), total throughput jumps dramatically. A "big" segment is
> 4K, a small segment is 512 bytes.

That jump being because the SCSI layer spent less time allocating?  Or is
that "8 times the data for 4-6 times the cost"?

That ought to be orthogonal to the USB layer, assuming hardware handles
queued bulk reasonably well .... though for OHCI and EHCI, the bigger
segments will require fewer TDs allocated per transaction, and I think
something Georg said translated to UHCI sometimes wanting a 1ms gap
between data URBs/segments (depending on how bulk queues are done).


> Currently, since I need to maintain the synchronization between endpoints
> manually, I handle each URB individualy.

Are C/B* devices required to ignore (NAK) bulk transfers until they get
the control command?  If so, software synch there can be avoided...
Otherwise I think it'll be hard to avoid that ... the HC will have to
finish transferring the command before anything can submit the data.

The completion handler for the command still seems (to me) it'd be
the natural place to submit the data.  Right now I think I see the code
expects a process to be scheduled before progressing ... which is
surely not the way to get better performance!


>      This works well when I've only
> got one command to deal with at a time, but I'd like to be able to handle
> multiple commands in the queue to improve performance.  This only makes
> sense when I can use my CPU time to construct the URB chains ahead of time,
> and submit them all at once, letting the DMA hardware take care of that
> series.  Note that I don't actually need any completion handler code except
> for the last (and possibly next-to-last in _some_ data-in cases) URB.  The
> URBs really do take care of themselves.  The problem is, they're not all
> bulk transfers.

But any one of them can get an error ... I don't quite see how the URBs can
take care of themselves that much.

I _do_ see how you can take one SCSI command, create all its URBs,
and start processing those all at once ... reducing today's scheduling
overheads by (A) doing more inter-transfer work in completion handlers,
and (B) using bulk queuing for the "data" stage of the transaction.


> Here's what I'd like in my dream world:  I allocate a largeish pool of URBs
> at init time.  I then take a command off the command queue and allocate
> URBs to handle it, and submit them. 

I can see all of that working with today's URB submission API, as sketched
above.  Though I'm thinking the "submit" would be to a small state machine
engine used only by usb-storage, which talks to usbcore as needed.  (Those
synchronous control/bulk submission routines would vanish ...)


>     While my ?HCI controller is happily
> DMAing data all over the place, I take the next command of the queue and
> being constructing an URB chain for it.  In the nominal "working" case, I
> get a signal from the final completion handler that tells me to submit the
> next chain.

Yep, that's how that little state machine engine could work.  Or, there could
be a queue that the state machine engine reads from -- no scheduler hookup
to trigger, just keep that engine as busy as the HC and SCSI queue allow.


> > I think the current bulk queuing has lots of mileage left.  I'd rather
> > not change that unless/until we find we're hitting a wall with it.
>
> Wall?  No.  Performance gain?  Yes.  Right now, usb-storage consumes and
> _enormous_ amount of CPU time juggling URBs and checking status.  And
> implementing a command queue is almost pointless.

I think there's a usb-storage performance gain to be had just by using the
current URB submission machinery better ... there's a lot of process/thread
scheduling happening, and getting rid of it should speed things up.

- Dave




_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
http://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to