Re: [linux-usb-devel] [PATCH 15/15] usbaudio retries EL2NSYNC

David Brownell Mon, 09 Oct 2006 10:32:33 -0700

On Saturday 07 October 2006 12:47 pm, Alan Stern wrote:
> On Sat, 7 Oct 2006, Christopher "Monty" Montgomery wrote:
> 
> > Let's establish how to report missed slots, I'll update the patches,
> > and then try to figure out what's going on with usbaudio.
> 
> Okay, let's move on to discuss this.  Part of the problem with our earlier
> discussion has been that there are actually 3 queues, any of which can dry
> up:
> 
>       Queue A is the flow of data from the application to snd-usb-audio
>       or some other high-level driver.  If this queue drains it is an
>       xrun and quite probably loss-of-sync.  The driver is free to
>       report the error to the application any way it wants to.  This is 
>       where application latency matters.
> 
>       Queue B is the flow of URBs from the driver to ehci-hcd.  If this


Such issues are not unique to EHCI of course ... so in this email, it's
safe to just read "HCD" unless something specific to that driver is
being discussed.


>       queue drains then the bandwidth is deallocated, something you
>       desperately want to avoid.  It's up to the higher-level driver to
>       keep the queue non-empty, even if that means submitting URBs with
>       dummy data.  Latency has no effect here.
> 
>       Queue C is the flow of packets from the host controller to the
>       device.  If this queue drains it is a loss-of-sync. 

And queue C would never drain unless queue B drains first... since they
are coupled one-to-one.  No packet goes to/from the peripheral (C) unless
it's been told to do so by the driver (B).

One issue being that the driver (ALSA usb, V4L2, etc) normally uses
URB completions (from the HCD) to drive motion through queue B.  And
those completions are IRQ-driven, so IRQ latency is a factor in being
able to notice that B has actually emptied.


>       The 
>       higher-level driver can find out about these occurrences only
>       by checking various return status codes from ehci-hcd, and then
>       it has to decide how to relay the information to the application.
>       This is where kernel and IRQ latency matter.

Another factor at this level is memory bus bandwidth.  It's probably
safe to assume that this is not an issue for most systems, but with
high bandwidth ISO it could be.  The host controller may not be able
to read buffers in time to write to the peripheral, or write them in
time to prevent lossage.  Your next point is relevant there:


> First a word of warning.  ISO transfers are UNRELIABLE!  A data-out packet
> sent by the host might not be received by the device, and the host would
> have no way to know.  Obviously such errors can't be reported since
> ehci-hcd never realizes they occur.  Of course, this doesn't reduce our
> obligation to report accurately the errors we _do_ know about.
> 
> 
> Errors caused by Queue A draining presumably are already handled in a 
> satisfactory manner.  If they aren't, it's a matter for the higher-level 
> driver; ehci-hcd has nothing to do with it.
> 
> Errors caused by Queue B draining have a fixed meaning according to the
> API, and they are catastrophic.  Fortunately they are easily avoided if
> the higher-level driver is properly configured.

Yes, and yes.


> So we only need to consider errors caused by Queue C draining.  Currently 
> there is no standardized way to report these errors back to the 
> higher-level driver.  Looking through documentation/usb/error-codes.txt, 
> the closest thing we see are these wonderful entries:
> 
> -EXDEV                        ISO transfer only partially completed
>                       look at individual frame status for details

... that can itself be the individual frame status though!

> -EINVAL                       ISO madness, if this happens: Log off and go 
> home

... and I'm not sure anyone reports that any more, at least as ISO
frame status.


> That's why I chose EXDEV in uhci-hcd; it seemed to be the closest match.  
> This might be a good time to settle the matter once and for all.
> 
> (Incidentally, the descriptions for EAGAIN and EFBIG are confusing and 
> possibly overlapping.  We should straighten them out as well.)

The EAGAIN description is wrong, as easily shown by grep.  Its only
usage is the more traditional one, that's not USB-specific:  try again.
 

> To be definite, let's suppose a periodic stream has been allocated a
> specific series of slots, and up until slot N-1 everything has been okay.

Where "series" == (u)frames number BASE + X * PERIOD, for all X.


> Now URB U is submitted, supposedly starting in slot N.  But something has
> gone wrong, and ehci-hcd isn't able to add U to the hardware schedule in
> time for slot N to be filled.  What should happen?
> 
> Sometimes it will be apparent at submission time that U is already too 
> late.  For instance, slot N's microframe might already be over.  In such 
> cases it is possible to return a submission error.  Let's call this option 
> #1.

Right, and that's the intent of the current reporting of EL2NSYNC.


> Sometimes it won't be apparent until later that the slot was missed.  
> When this happens, ehci-hcd is unable to report a submission error U.  
> Another possible approach is to report a submission error for the
> following URB, U+1.  Let's call this option #2.

Don't much like that one.  Requires HCDs to record history that would
otherwise not be needed, and interrogate it.


> The only other reasonable option, #3, is to report an error upon the 
> completion of U.  These two events (submission and completion) are the 
> only chances ehci-hcd has to communicate with a higher-level driver.

This #3 is when -EXDEV gets reported, and/or noticing the start_frame
hiccup.  I see no way around having these.


> So which option should we use?
> 
> 
> As Dave has already mentioned, options #1 and #2 share certain practical
> problems.  Returning an error for a submission when in fact the submission
> was accepted is _not_ a good policy.  The return code could be a positive
> value, not a negative -Exxx code.  Then it would be necessary to audit all
> the USB drivers that use ISO endpoints, because there could easily be
> cases where the driver checks only for 0 or nonzero.

That audit should easy enough for the in-core druvers.

 
> We could in fact return an error code and _not_ accept the submission,
> counting on the driver to realize what went wrong and make a new
> submission.  I think this is overly complicated.  And it provides no hint
> as to exactly how many slots were missed.  Finally, it increases the
> amount of work performed by both the higher-level driver and ehci-hcd --
> exactly the sort of thing you _don't_ want to do when a stream is lagging
> behind.

I don't know how ALSA works just now, but I suspect that it'd be better
to return a status code meaning "I couldn't queue this, your driver is
N uframes behind" so that the driver could retry intelligently by just
skipping those uframes.

So this question is best answered by the drivers who would be using
that recovery mechanism ... notably ALSA and V4L2.

 

> I don't know just how bad that disadvantage is.  No doubt it depends on
> the nature of the application.  However the fact remains, once sync has
> been lost it's already too late to recover fully.  All you can do is
> recover as much as possible, as quickly as possible.  Delaying the error
> notification until U completes won't slow down recovery significantly.

That's part of why the USB spec defines feedback mechanisms for ISO.

> 
> 
> There's one final matter I want to bring up, having to do with 
> urb->start_frame.  The API doesn't say much about how start_frame should 
> be interpreted once a stream is already established.  The assumption is 
> that drivers won't use it, specifying URB_ISO_ASAP instead to cause each 
> URB to fill the slot immediately following the end of the previous URB.

That's one assumption.


> I don't know how ohci-hcd and ehci-hcd treat start_frame in an established 
> stream.

Those HCDs have never treated it as anything other than an output that's
accessible in URB completion.  That is, the ISO_ASAP thing is assumed,
and start_frame is never used as a "please start this much in the future"
input.

- Dave

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Re: [linux-usb-devel] [PATCH 15/15] usbaudio retries EL2NSYNC

Reply via email to