Re: [linux-usb-devel] Re: Patch to change error code for "device not responding"

David Brownell Thu, 10 Feb 2005 15:15:56 -0800

On Thursday 10 February 2005 12:08 pm, Alan Stern wrote:
> 
>        "Device not responding" means either that the device
> isn't working right or that it's disconnected from the bus (or there's a
> lot of interference on the line, or a hardware fault...).  It's a
> low-level problem.  "Request timed out" means that everything is working
> correctly at the USB protocol level, and the device simply is unable to
> send or receive data at the current time.  It's a higher-level error.


But do any drivers care about that difference?  None I've heard of do.

And the confusion only comes up with the synchronous usb_*_msg() calls,
since otherwise there's no ambiguity.  Drivers that care about fault
handling mostly avoid those calls anyway.  (They prevent reasonable
signal handling, blocking CTRL-C ...)


> > > Do you know of any drivers that rely on ETIMEDOUT indicating "device not 
> > > responding"?  Especially considering that neither UHCI nor EHCI returns 
> > > that code?  
> > 
> > My friend Mr. GREP showed a bunch, and that's only for the in-tree drivers.
> > (I take it you didn't do that when you submitted that doc patch... even
> > within the directory with HCDs returning ETIMEDOUT.)
> 
> Hold on there!  I said "rely on".  That means they won't work if some 
> other code is used for "no response".  In fact there can't be any driver 
> like that, because neither UHCI nor EHCI uses ETIMEDOUT.  If a driver did 
> rely on that code then it wouldn't work with those HCDs.

Code that references a behavior relies on it too ... so that code
handling urb->status == -ETIMEDOUT would become roadkill given
that patch you sent.  A fair amount of such code does exist, and
if that fault code were to change, that code should change too.

Again:  most of that code doesn't use the synchronous calls,
so it's never subject to this confusion...


> > They've certainly had almost six years to notice that the documentation
> > says urb->status may return ETIMEDOUT, and that in fact it can do so.
> > Versus only a few months to notice that documentation-only change...
> 
> The old documentation was confusing at best.  As I recall, it said
> ETIMEDOUT means something like "No ack, request timed out".

When 2.6 started it said "transfer timed out, NAK".  That was later
tweaked to flag this as one of the hardware-specific codes that gets
returned between "device unplugged" and "khubd disconnects driver".

It'd have been better to say "no NAK", or otherwise clarify that,
rather than delete it; though at least those disconnect paths already
imply that no NAKs were received.


> > "Better" has never been clear to me or I'd have changed something
> > myself.  Changing fault handling logic is unfortunately error prone.
> > When POSIX started to change error code semantics to reduce the
> > overlap, it took many years for kernels and applications to adapt.
> > This seems like nice cleanup for a 2.7-on-the-way-to-2.8/3.0 kernel,
> > certainly, but otherwise it seems more like needless instability.
> 
> I'm willing to wait.

It's unclear to many of us when a Linux.next will start, but lots of
folk _do_ think that there are various changes (like that one) which
shouldn't be merged until then.  Not that Linux overall has the sort
of release discipline to prevent that sort of destabilization; it's
an issue that doesn't quite get raised as part of the "is this 'new
development model' working" debates.  (The old model didn't handle
it explicitly either!!)


> > Do you not like ETIME?  Or does that just beg too many questions
> > about whether the error isn't really in usb_{bulk,control}_msg()
> > for using ETIMEDOUT, when ETIME is a better match for what it's
> > indicating?  No matter what such a change is, it's got to touch
> > a lot of fault handling logic.
> 
> I actually do like ETIME for "request timed out", reserving ETIMEDOUT for 
> "no response".  I'm just concerned that it affects user programs as well 
> as the kernel.

Well, such changes have hard-to-predict effects.  Certainly returning
EPROTO would make the EPROTO confusions even worse, and either sort
of change would be visible to userspace.


> The message from Brian Murphy shows a good example. 

Not really; that was just a case where the synchronous call has
been doing the wrong thing for a long time:  it was inappropriately
reporting a fault.  The specific fault code doesn't matter there;
the issue was discarding data that was successfully read.

- Dave


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
[email protected]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Re: [linux-usb-devel] Re: Patch to change error code for "device not responding"

Reply via email to