Dave,

Thanks for your quick and insightful response!

* David Brownell <[EMAIL PROTECTED]> [2005-11-25 10:45]:
>> The 32k on-chip memory window poses two problems:
>>  1. the HCD can't use the DMA-mapping API
>
> That doesn't follow.  It needs platform-specific updates,
> but that issue is purely with implementation, not API.
>
>   * struct device.dma_mem is there specifically to handle cases
>     like this for dma-coherent memory ... one implementation is
>     in arch/i386/kernel/pci-dma.c but there are others too.

I agree that this is the best solution, but you may remember the
arm-kernel and LKML discussions over a year ago regarding this
specific HC that concluded that the DMA-mapping API should only
be used for host RAM and that the driver should handle allocation
and bouncing of a device's memory.  More recently, Russel King
didn't accept a patch to implement struct device.dma_mem on ARM,
explaining his concerns about using the DMA API for non-host RAM
and preferring "[dmabounce] to be used only to hack around the
...  problems found on IXP PCI platforms and SA11x0 platform
using the SA1111 device."

http://lkml.org/lkml/2004/6/18/274
http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2004-June/022810.html
http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2005-July/030099.html

But alas, this is a matter that I will take to the appropriate
lists, so for our discussion here, I'll assume I can use struct
device.dma_mem and dmabounce.

> I'd see no problem with even an 8KB/urb limit.  Drivers that
> can't cope with such faults just need to be fixed to back off
> to smaller URB buffers.

Right now, the 32KB initially allocates as:
  4KB - HCCA, dma-coherent buffers --- managed by lib/genalloc.c
  4KB - TDs and EDs sharing one dma_pool (64 TDs + EDs)
  8KB - dmabounce buffers:
         4KB - 64  byte pool
         4KB - 512 byte pool
 16KB - four free pages

This seems to work fine under normal use.  However, if the
genalloc pool, a dma_pool, or a device allocate 2 pages, it may
fragment the pages so a contiguous 8KB allocation will always
fail.  It seems that drivers might need to support a PAGE_SIZE
URB size limit.

I don't have any explicit URB size checks at the moment.  URBs
are either mapped during submit, or usb_submit_urb() returns
-ENOMEM.  I had to modify core/hcd.c to check for dma mapping
failures.  It's the attached patch.

With this bare-bones bus glue and max_sectors=8, the HCD doesn't
run out of dma-able memory while using a keyboard and mouse and:
 # dd if=/dev/sda of=/dev/null bs=64k &         # usb-storage
 # dd if=/dev/sdb of=/dev/null bs=64k &         # usb-storage
 # wget [large file]                            # prism2_usb

That's impressive. =)

> What that means is that on your hardware, drivers would be
> getting URB submission failures and thus tripping some fault
> paths that may not be all that well tested

How frequently should URB submission failures happen?  Without
queueing URBs, if there are 4 devices that submit 4KB URBs back
to back, then the they could prevent a 5th device from receiving
anything but -ENOMEM when submitting a URB.  Similarly, one
usb-storage device with max_sectors=32 could hog the bus.

I want to implement URB queueing, where URB submission succeeds
if the URB could be mapped later.  This would prevent sporadic
submission failures, usb_submit_urb() could return a better error
like -EMSGSIZE for too large URBs, and the URB scheduler can
round-robin EPs to prevent one device from hogging the bus.

> ... so there'd likely be some driver bugs to fix, and
> _potentially_ some cases where "early" dma mapping raises
> interesting issues.
>
> For example, scatterlists from the block layer would probably
> need to shrink ... I don't think either "usb-storage" or "ub"
> will tell it to do that without instructions from userspace,
> usually done based on the device being connected rather than
> the controller.

echo 8 > /sys/block/sda/device/max_sectors; works for now, but
I'd like a better solution for when this HCD is included in a
distribution.  Preferably, usb-storage would learn that it can't
submit large URBs and wouldn't map more than it submits.
max_sectors could take care of this, but it looks like
usb-storage does some 16KB requests as four 4KB URBs.  Is it
possible to use a larger max_sector with 4KB URBs?  That would
help performance immensely.

usb-storage doesn't check for dma mapping failures either.  After
a mapping failure, it still submits a URB with transfer_dma == ~0
and URB_NO_TRANSFER_DMA_MAP set.  That fails horribly:

tc6393-usb tc6393-usb: alloc_safe_buffer: could not alloc dma memory (size=4096)
tc6393-usb tc6393-usb: map_single: unable to map unsafe buffer c3104000!
tc6393-usb tc6393-usb: bad entry cb41d6d0
tc6393-usb tc6393-usb: Trying to unmap invalid mapping
tc6393-usb tc6393-usb: bad entry        1
tc6393-usb tc6393-usb: bad entry 410367d1
tc6393-usb tc6393-usb: alloc_safe_buffer: could not alloc dma memory (size=4096)
tc6393-usb tc6393-usb: map_single: unable to map unsafe buffer c314c000!
tc6393-usb tc6393-usb: bad entry        1
tc6393-usb tc6393-usb: bad entry cb427bd0
tc6393-usb tc6393-usb: bad entry cb42b960
tc6393-usb tc6393-usb: OHCI Unrecoverable Error, disabled
tc6393-usb tc6393-usb: HC died; cleaning up
IRQ LOCK: IRQ122 is locking the system, disabled
usb 1-1: USB disconnect, address 3
usb 1-1.1: USB disconnect, address 4
tc6393-usb tc6393-usb: Trying to unmap invalid mapping
scsi: Device offlined - not ready after error recovery: host 1 channel 0 id 0 
lun 0
scsi1 (0:0): rejecting I/O to device being removed
scsi1 (0:0): rejecting I/O to device being removed
usb 1-1.4: USB disconnect, address 6
usb 1-2: USB disconnect, address 2
scsi1 (0:0): rejecting I/O to dead device

Since the usb-storage maps its own dma-able memory, it would
cause problems with URB queueing.  When I was queueing TDs, the
HCD had dma_mask == NULL, so the HCD handled all dma operations.
I could do this again, as it has a few advantages:
 * The generic allocator manages the 28KB not used by the TD and
   ED dma pool, resulting in more efficient memory usage.
 * Increased performance?  With usb-storage, I'm currently
   getting 568kB/s @ 32% CPU (max_sectors=24) as opposed to
   900kB/s @ 8% CPU (max_sectors=240) with TD queueing.
 * This method doesn't use dmabounce, which is more acceptable
   upstream.

This wouldn't require modification to the USB core if I
temporarily replace the USB completion handler, but this seems
like a hack.  Ideally, struct usb_operations would have a
'complete_urb' or 'giveback_urb' function I would use to unmap
dma memory.

Thanks,
Chris Humbert



--- linux-2.6.14-rc1/drivers/usb/core/hcd.c.orig        2005-11-26 
04:51:35.000000000 -0600
+++ linux-2.6.14-rc1/drivers/usb/core/hcd.c     2005-11-26 22:04:01.000000000 
-0600
@@ -1178,15 +1178,8 @@
         * unless it uses pio or talks to another transport.
         */
        if (hcd->self.controller->dma_mask) {
-               if (usb_pipecontrol (urb->pipe)
-                       && !(urb->transfer_flags & URB_NO_SETUP_DMA_MAP))
-                       urb->setup_dma = dma_map_single (
-                                       hcd->self.controller,
-                                       urb->setup_packet,
-                                       sizeof (struct usb_ctrlrequest),
-                                       DMA_TO_DEVICE);
                if (urb->transfer_buffer_length != 0
-                       && !(urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP))
+                       && !(urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP)) {
                        urb->transfer_dma = dma_map_single (
                                        hcd->self.controller,
                                        urb->transfer_buffer,
@@ -1194,6 +1187,31 @@
                                        usb_pipein (urb->pipe)
                                            ? DMA_FROM_DEVICE
                                            : DMA_TO_DEVICE);
+
+                       if (dma_mapping_error (urb->transfer_dma)) {
+                               status = -ENOMEM;
+                               goto done;
+                       }
+               }
+               if (usb_pipecontrol (urb->pipe)
+                       && !(urb->transfer_flags & URB_NO_SETUP_DMA_MAP)) {
+                       urb->setup_dma = dma_map_single (
+                                       hcd->self.controller,
+                                       urb->setup_packet,
+                                       sizeof (struct usb_ctrlrequest),
+                                       DMA_TO_DEVICE);
+
+                       if (dma_mapping_error (urb->setup_dma)) {
+                               dma_unmap_single (hcd->self.controller, 
+                                               urb->transfer_dma,
+                                               urb->transfer_buffer_length,
+                                               usb_pipein (urb->pipe)
+                                                   ? DMA_FROM_DEVICE
+                                                   : DMA_TO_DEVICE);
+                               status = -ENOMEM;
+                               goto done;
+                       }
+               }
        }
 
        status = hcd->driver->urb_enqueue (hcd, ep, urb, mem_flags);



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to