Re: [linux-usb-devel] mmap() for usbdevfs, zerocopy EHCI ?

2005-05-06 Thread David Brownell
On Wednesday 04 May 2005 12:13 pm, Christopher Li wrote:
 I would like to see (or may be working on it depends how many times
 I get)  the zero copy for usbfs.
 
 It has been mention a few times about the usbfs2. What is
 the plan there so far?

It's still in the talk-about-it stage, no plan.
Someone (or more than one someone!) needs to commit
the effort to take it to later stages.


 On Mon, May 02, 2005 at 10:23:11AM -0700, David Brownell wrote:
  
  I was rather shocked to notice that gadgetfs AIO support took only
  a KByte or so of x86 object code, though that wasn't using zerocopy.
  It might be a bit trickier on the host side, mostly to create that
  file per endpoint hook into usbfs ... but once that's there, that
 
 Just want to mention that I actually like the file per device when
 I mainly using the submit urb interface. It is fewer files to select
 from. Maybe things will be different in the AIO world.

Yes, it's file-per-endpoint.  File descriptors are cheap though,
and using poll() instead of select() helps with many of the problems
select() has with lots of files.

File-per-endpoint is the way to go for lots of reasons, anyway.  It's
the standard way to do streaming I/O ... and lets file descriptors
be passed to components that have no reason to know about USB.

- Dave


 Chris
 
 


---
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] mmap() for usbdevfs, zerocopy EHCI ?

2005-05-04 Thread Christopher Li
I would like to see (or may be working on it depends how many times
I get)  the zero copy for usbfs.

It has been mention a few times about the usbfs2. What is
the plan there so far?


On Mon, May 02, 2005 at 10:23:11AM -0700, David Brownell wrote:
 
 I was rather shocked to notice that gadgetfs AIO support took only
 a KByte or so of x86 object code, though that wasn't using zerocopy.
 It might be a bit trickier on the host side, mostly to create that
 file per endpoint hook into usbfs ... but once that's there, that

Just want to mention that I actually like the file per device when
I mainly using the submit urb interface. It is fewer files to select
from. Maybe things will be different in the AIO world.

Chris



---
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] mmap() for usbdevfs, zerocopy EHCI ?

2005-05-02 Thread Duncan Sands
 I've thought about that on occasion.  On some processors you'd
 need to flush the userspace caches first, but on typical PC-ish
 stuff the main concern would be making sure that the buffers are
 aligned nicely ... i.e. only start a 512 byte packet on a 512 byte
 boundary, since if it crosses pages then most systems aren't going
 to be able to turn it into DMA-contiguous address space.  (Even
 with an IOMMU, it's not guaranteed ...)  In terms of USB protocol,
 one 512 packet != two packets of 500 + 12.

If it's not aligned nicely, then you could send the initial unaligned
bit in it's own urb, by copying, and the rest directly out of the
userspace buffer.  I guess the philosophy should be: data will be
transferred correctly regardless of whether the user-space buffer is
well-aligned or not, however if user-space wants maximum performance
then it is responsible for providing an optimally aligned buffer.

Ciao,

D.



---
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] mmap() for usbdevfs, zerocopy EHCI ?

2005-05-02 Thread Oliver Neukum
Am Montag, 2. Mai 2005 03:38 schrieb David Brownell:
 It might well be simpler to just pin whatever (aligned) buffers
 have been passed, ensure they're properly flushed, and then just
 DMA to/from those pages without requiring special DMA mappings
 to be set up first, and without needing special new usbfs calls.

How would you make sure the buffers passed are DMA-able?
Also, how would you make sure they can be flushed independently?
You avoid that trouble by providing multiples of PAGE_SIZE through
mmap. Plus, you can easily make available the _sg_ API to user space
this way.

 I was looking at that sort of stuff a while back, in conjunction
 with seeing how the AIO stuff might replace the rather funky
 USB-specific AIO-ish stuff in usbfs.  A lot of the relevant
 infrastructure is already in place.

Yes, ideally we had files for each endpoint and use AIO. We haven't,
though.

 As an example, we now have AIO support in gadgetfs, which I've
 suggested should be the basic model to follow when rewriting
 usbfs.  With a one-to-one mapping between URBs (or usb_requests)
 and kiocbs, an incremental development step might be as simple as
 just adding an ioctl to usbfs to return a new AIO-capable file
 handle for a given endpoint, then using normal AIO calls on that
 to reuse some of the existing zerocopy work ...

Good idea.
 
[..]
 The key point to draw from that is that all this zerocopy stuff
 can (and should!!) be done at layers above usbcore.  If the
 layer above -- usbfs, usbfs2, or even usbfs 1.5 -- passes
 URBs with DMA mappings already established, all the nasty/fragile
 usbcore stuff can be left alone.  And even usbfs could mostly
 be left alone.

Yes.

Regards
Oliver


---
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r 
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] mmap() for usbdevfs, zerocopy EHCI ?

2005-05-02 Thread David Brownell
On Monday 02 May 2005 1:24 am, Duncan Sands wrote:
 
 If it's not aligned nicely, then you could send the initial unaligned
 bit in it's own urb, by copying, and the rest directly out of the
 userspace buffer. 

Or more typically, when the file descriptor has O_DIRECT set (which
tends to flag the desire for zerocopy) then treat that as an error.
Otherwise (no O_DIRECT), then either don't try for zerocopy, or else
only try that path when things are aligned safely.

- Dave


---
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] mmap() for usbdevfs, zerocopy EHCI ?

2005-05-02 Thread David Brownell
On Monday 02 May 2005 3:45 am, Oliver Neukum wrote:
 Am Montag, 2. Mai 2005 03:38 schrieb David Brownell:
  It might well be simpler to just pin whatever (aligned) buffers
  have been passed, ensure they're properly flushed, and then just
  DMA to/from those pages without requiring special DMA mappings
  to be set up first, and without needing special new usbfs calls.
 
 How would you make sure the buffers passed are DMA-able?

That's the _normal_ DMA mapping issue.  Nothing special, like
setting up special mmapped areas and using those ... just the
routine stuff that filesystem code deals with routinely if it
provides zerocopy I/O for user read or write activities.


  I was looking at that sort of stuff a while back, in conjunction
  with seeing how the AIO stuff might replace the rather funky
  USB-specific AIO-ish stuff in usbfs.  A lot of the relevant
  infrastructure is already in place.
 
 Yes, ideally we had files for each endpoint and use AIO. We haven't,
 though.

That was the point of my suggestion to add a new usbfs request
to return a new file descriptor for the endpoint.  If it's got
the AIO support, it'd automatically have normal read/write support.
For bulk endpoints, some clear_halt support could be useful.

I was rather shocked to notice that gadgetfs AIO support took only
a KByte or so of x86 object code, though that wasn't using zerocopy.
It might be a bit trickier on the host side, mostly to create that
file per endpoint hook into usbfs ... but once that's there, that
AIO framework should be a good framework to for zerocopy work.

- Dave


  As an example, we now have AIO support in gadgetfs, which I've
  suggested should be the basic model to follow when rewriting
  usbfs.  With a one-to-one mapping between URBs (or usb_requests)
  and kiocbs, an incremental development step might be as simple as
  just adding an ioctl to usbfs to return a new AIO-capable file
  handle for a given endpoint, then using normal AIO calls on that
  to reuse some of the existing zerocopy work ...
 
 Good idea.
  
 [..]
  The key point to draw from that is that all this zerocopy stuff
  can (and should!!) be done at layers above usbcore.  If the
  layer above -- usbfs, usbfs2, or even usbfs 1.5 -- passes
  URBs with DMA mappings already established, all the nasty/fragile
  usbcore stuff can be left alone.  And even usbfs could mostly
  be left alone.
 
 Yes.
 
   Regards
   Oliver
 


---
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r 
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


[linux-usb-devel] mmap() for usbdevfs, zerocopy EHCI ?

2005-05-01 Thread Harald Welte
Hi!

I've been playing quite a bit with the gnuradio[1] project's USRP[2]
recently.  One of the key issues with the USRP is to get the highes
possible USB2.0 throughput (since the ADC and DAC's actually outperform
the slow USB2.0 bus).  Another key issue is that for digital signal
processing you need lots of CPU.

since gnuradio uses usbdevfs and libusb in order to rx and tx data to
the USRP, it copies all data back and forth between kernel and
userspace.

Obviously, this kind of high-bandwith USB use is usually only seen from
kernel drivers (mass storage, ...) whereas libusb+usbdevfs is only used
for lower bandwith applications.  So there is room for improvements...

Having a networking background, I thought this is similar to the network
stack and mmap()ed PF_PACKET (and now even PF_RING) sockets.

A quick browse through the EHCI specification and the ehci linux hcd
driver revealed that it should be technically possible to:

0) open an usbdevfs file like usual
1) set up a mmap()ed buffer between kernel and userspace
2) create one (or multiple consecutive) urb that points into the
   mmap()ed buffer
3) submit that urb to ehci-hcd, which would in turn set up qtd's
   pointing directly into that buffer

The result should be a truly zerocopy dma-to-userspace architecture.

As my only connection with the usb code so far has been the cyberjack
driver and usbdevfs- based userspace programs, I'd like to receive the
comments of people more familiar with the usb subsystem.

Do you think a system described above is actually feasible?  If yes,
what kind of implementation suggestions do you have?  

[1] http://comsec.com/wiki
[2] http://comsec.com/wiki?UniversalSoftwareRadioPeripheral

-- 
- Harald Welte [EMAIL PROTECTED]  http://gnumonks.org/

Privacy in residential applications is a desirable marketing option.
  (ETSI EN 300 175-7 Ch. A6)


pgpqDDCqYOSfS.pgp
Description: PGP signature


Re: [linux-usb-devel] mmap() for usbdevfs, zerocopy EHCI ?

2005-05-01 Thread Eric Blossom
On Sun, May 01, 2005 at 09:29:41PM +0200, Harald Welte wrote:
 Hi!
 
 I've been playing quite a bit with the gnuradio[1] project's USRP[2]
 recently.  One of the key issues with the USRP is to get the highes
 possible USB2.0 throughput (since the ADC and DAC's actually outperform
 the slow USB2.0 bus).  Another key issue is that for digital signal
 processing you need lots of CPU.

 since gnuradio uses usbdevfs and libusb in order to rx and tx data to
 the USRP, it copies all data back and forth between kernel and
 userspace.
 
 Obviously, this kind of high-bandwith USB use is usually only seen from
 kernel drivers (mass storage, ...) whereas libusb+usbdevfs is only used
 for lower bandwith applications.  So there is room for improvements...
 
 Having a networking background, I thought this is similar to the network
 stack and mmap()ed PF_PACKET (and now even PF_RING) sockets.
 
 A quick browse through the EHCI specification and the ehci linux hcd
 driver revealed that it should be technically possible to:
 
 0) open an usbdevfs file like usual
 1) set up a mmap()ed buffer between kernel and userspace
 2) create one (or multiple consecutive) urb that points into the
mmap()ed buffer
 3) submit that urb to ehci-hcd, which would in turn set up qtd's
pointing directly into that buffer
 
 The result should be a truly zerocopy dma-to-userspace architecture.

This sounds good.  You will definitely need more than one urb queued.
Getting maximum throughput from the USB requires that you ensure that
the endpoint queue for the EHCI is never empty.  We do accomplish that
using the current strategy of submitting multiple urbs from user
space.  The usrp has only about 300us of buffering at 32MB/sec, so not
keeping the EHCI endpoint queue filled is a disaster ;-)

I think the biggest benefit would be in giving us potentially lower
latency.  Right now, our throughput bottleneck is in the firmware in
the FX2.  We currently get 32MB/sec.  We know that at least for
unidirectional apps (might not apply to us) 40MB/sec is possible with
the FX2.

With regard to the cost of the user to kernel copy, I suggest making
measurements.  Last time I checked, the driver was only taking up on
the order of 5% of the CPU.  

[FYI, in fusb_ephandle_linux::write you can remove an intermediate
copy for certain values of nbytes and buffer alignments.  This
function never showed up high in oprofile's output, so I didn't bother
coding it.  A memory bandwidth exceeding 1GB/sec covers all kinds of
sins.]

 Do you think a system described above is actually feasible?  If yes,
 what kind of implementation suggestions do you have?  

Definitely feasible.  Again, I think the big win would be in reducing
the latency between the user space app and the actual USRP hardware.
Getting this latency as low as possible would definitely be worth
doing for a lot of applications, including anything with tight Rx to
Tx turn around.

Eric


---
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] mmap() for usbdevfs, zerocopy EHCI ?

2005-05-01 Thread Oliver Neukum
Am Sonntag, 1. Mai 2005 21:29 schrieb Harald Welte:
 A quick browse through the EHCI specification and the ehci linux hcd
 driver revealed that it should be technically possible to:
 
 0) open an usbdevfs file like usual
 1) set up a mmap()ed buffer between kernel and userspace
 2) create one (or multiple consecutive) urb that points into the
    mmap()ed buffer
 3) submit that urb to ehci-hcd, which would in turn set up qtd's
    pointing directly into that buffer
 
 The result should be a truly zerocopy dma-to-userspace architecture.
 
 As my only connection with the usb code so far has been the cyberjack
 driver and usbdevfs- based userspace programs, I'd like to receive the
 comments of people more familiar with the usb subsystem.
 
 Do you think a system described above is actually feasible?  If yes,
 what kind of implementation suggestions do you have?  

In static int hcd_submit_urb (struct urb *urb, int mem_flags) you can
see that usbcore really doesn't care what buffer you feed it if you've
called dma_map_single() on it. You will need to make sure that buffers
don't overlap and make sure there's some limit to the ram a task can pin
thus.

Regards
Oliver


---
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r 
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] mmap() for usbdevfs, zerocopy EHCI ?

2005-05-01 Thread David Brownell
On Sunday 01 May 2005 1:37 pm, Eric Blossom wrote:
 
 I think the biggest benefit would be in giving us potentially lower
 latency.  Right now, our throughput bottleneck is in the firmware in
 the FX2.  We currently get 32MB/sec.  We know that at least for
 unidirectional apps (might not apply to us) 40MB/sec is possible with
 the FX2.

But not necessarily all EHCI controllers, or all systems.  I've
benched almost 40 MB/sec on some systems; others have a hard time
topping 12 MB/sec, seemingly due to PCI or southbridge problems.
And one system that's previously given me 32 MB/sec seems to have
taken about 6 MB/sec away from me, I'm not sure where it's gone. :(


 With regard to the cost of the user to kernel copy, I suggest making
 measurements.  Last time I checked, the driver was only taking up on
 the order of 5% of the CPU.  

Good point.  A while back I remember doing some stuff that streamed
24 MByte/sec to userspace, single copy on an Athlon.  That ran something
like 10% of that CPU, which was pretty slow by today's standards.  It
was using high bandwidth isochronous transfers, which aren't the most
CPU-efficient mode to work with.  (Each 3KB packet was an indivdiual
AIO request.)

- Dave



---
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] mmap() for usbdevfs, zerocopy EHCI ?

2005-05-01 Thread David Brownell
On Sunday 01 May 2005 2:04 pm, Oliver Neukum wrote:
 Am Sonntag, 1. Mai 2005 21:29 schrieb Harald Welte:
  A quick browse through the EHCI specification and the ehci linux hcd
  driver 

This should be true of any HCD that supports DMA, FWIW.

EHCI is probably most interesting because it's the only
HCD that already achieves dozens of MByte/sec througput,
hence the only one where a few more might be achievable!
And where the cpu overheads of lots of userspace copies
can start to be objectionable, too.


  revealed that it should be technically possible to: 
  
  0) open an usbdevfs file like usual
  1) set up a mmap()ed buffer between kernel and userspace
  2) create one (or multiple consecutive) urb that points into the
     mmap()ed buffer
  3) submit that urb to ehci-hcd, which would in turn set up qtd's
     pointing directly into that buffer
  
  The result should be a truly zerocopy dma-to-userspace architecture.

I've thought about that on occasion.  On some processors you'd
need to flush the userspace caches first, but on typical PC-ish
stuff the main concern would be making sure that the buffers are
aligned nicely ... i.e. only start a 512 byte packet on a 512 byte
boundary, since if it crosses pages then most systems aren't going
to be able to turn it into DMA-contiguous address space.  (Even
with an IOMMU, it's not guaranteed ...)  In terms of USB protocol,
one 512 packet != two packets of 500 + 12.


It might well be simpler to just pin whatever (aligned) buffers
have been passed, ensure they're properly flushed, and then just
DMA to/from those pages without requiring special DMA mappings
to be set up first, and without needing special new usbfs calls.

I was looking at that sort of stuff a while back, in conjunction
with seeing how the AIO stuff might replace the rather funky
USB-specific AIO-ish stuff in usbfs.  A lot of the relevant
infrastructure is already in place.

As an example, we now have AIO support in gadgetfs, which I've
suggested should be the basic model to follow when rewriting
usbfs.  With a one-to-one mapping between URBs (or usb_requests)
and kiocbs, an incremental development step might be as simple as
just adding an ioctl to usbfs to return a new AIO-capable file
handle for a given endpoint, then using normal AIO calls on that
to reuse some of the existing zerocopy work ...


  Do you think a system described above is actually feasible?  If yes,
  what kind of implementation suggestions do you have?  
 
 In static int hcd_submit_urb (struct urb *urb, int mem_flags) you can
 see that usbcore really doesn't care what buffer you feed it if you've
 called dma_map_single() on it. You will need to make sure that buffers
 don't overlap and make sure there's some limit to the ram a task can pin
 thus.

The key point to draw from that is that all this zerocopy stuff
can (and should!!) be done at layers above usbcore.  If the
layer above -- usbfs, usbfs2, or even usbfs 1.5 -- passes
URBs with DMA mappings already established, all the nasty/fragile
usbcore stuff can be left alone.  And even usbfs could mostly
be left alone.

Eventually I'd not be surprised to find that an AIO layer on top
of usbcore would highlight some places where the USB I/O path
could usefully be tuned.  Right now the fastest path there is
probably the usb_sg_*() stuff, but there's nothing letting those
techniques (IRQ reduction, transfer queueing, IOMMU coalescing)
be accessed from userspace.

- Dave


---
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r 
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel