> From: Laurent Pinchart <[EMAIL PROTECTED]>
> To: Alan Stern <[EMAIL PROTECTED]>
> Date: Thu, 20 Dec 2007 21:04:32 +0100
>
>       ...
>
> Sunplus, the chip maker, investigated the problem and here is their 
> explanation:
>
> "STALL: The command timing in the linux is much faster than windows
> system.

Presumably because Windows doesn't use the hardware to anything
like its design capacity.  By and large, those timings come from
the hardware; all Linux does is set up the requests and tell the
hardware to perform them.


>       This situation happens after the USB command is finished after
> status stage occasionally. In the ISR, when firmware takes care an ACK
> interrupt, for example an IN_ACK interrupt, firmware will clean the IN_ACK
> event by “ CBREG_USB_Ep0AckEvt &= ~INTR_USB_EP0_IN_ACK; ”. In this code,
> CPU will do (1) Read CBREG_USB_Ep0AckEvt to a buffer, (2) Use buffer to do
> the AND operation, (3) Write the buffer value back to register
> CBREG_USB_Ep0AckEvt. If a SETUP_ACK event is enabled after between (1) and
> (3), this event will be clear after (3) is done. This will cause the STALL
> problem. Firmware will miss the SETUP_ACK event and thinks that a NAK event
> is received after the USB command is finished and then return a STALL."

That class of bugs would be easily uncovered if these vendors
used the Linux "usbtest" program and required their systems to
pass the "test #10" control tests:

        http://www.linux-usb.org/usbtest/#usbtest

As it says on that page:

        Test 10 has been particularly effective at shaking out
        low-level controller and driver bugs on both host and
        peripheral/gadget sides. It issues many back-to-back
        control transfers, and induces faults such as protocol
        stalls; so it's exposed races, fault handling bugs,
        and various annoying combinations of the two.

And it does so as fast as the host can deliver them, which is a
hardware limit (unless the software gets way behind).

Overall, it's a usefully sadistic little test ... but one that
most of the Linux-USB peripheral controller drivers can run for
a week without failing.  ;)

You might suggest to Sunplus that they use that to a help avoid
shipping products with this type of easily-reproduced bugs.


> A related USB trace captured with a USB analyser is available at 
> http://www.irobotique.be/sunplus-trace.jpg. Please note that this
> trace isn't related to the usbmon trace I sent in my last e-mail.
>
> The explanation seems a bit unclear to me. What I understand is that 
> interrupts can be lost if they arrive at the wrong time (seems like
> a broken microcontroller to me if you can clear interrupt bits by
> writing 0).

It *might* be a broken USB peripheral design; it's easy enough to
end up with signal races if you're not careful.

The text you quote makes it sound like they didn't separate SETUP
acks from other packet acks, which would be a different type of
design bug since setup packets may come at any time (aborting any
currently active but uncompleted requests).

Alternatively, this could just be sloppy programming that's gotten
by for some time now because in "normal" operation that race isn't
very common.  It'd just cause a bit more "background noise" in
terms of flakey products in the field ... hard to say what caused
any given non-reproducible failure.

If there are bits with write-zero-to-clear semantics, paranoid
programmers will always write one unless they *intend* to clear.
Likewise with write-one-to-clear: they always write zero unless
they want to clear it on that path.  When dealing with hardware,
that type of paranoia is more than just healthy!!

- Dave


-
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to