Hi,

On Thu, Jun 06, 2019 at 05:11:04PM -0400, A Sun wrote:
> Hi Sean,
> 
> Thanks again for reviewing my patch submission.
> 
> This patch updates mceusb RX and TX HALT error handling and recovery 
> reporting,
> and provides placeholder for a later patch.
> 
> The possible later patch would have mceusb call usb_reset_device() in place 
> of the
> new "... requires USB Reset ..." message. I say "possible" because I'm 
> running into
> multiple issues invoking usb_reset_device() from within mceusb.
> 
> I'll also mention the difficulty obtaining even a single fault replication
> (~ 1/month) and test cycle.

OK, thank you.

> Additional comments below...  ..A Sun
> 
> On 6/6/2019 5:53 AM, Sean Young wrote:
> > On Sat, Jun 01, 2019 at 07:35:09PM -0400, A Sun wrote:

-snip-

 >> Additional findings are "modprobe -r mceusb" and "modprobe mceusb"
> >> are unable to clear stuck RX or TX HALT state.
> >> Attempts to call usb_reset_endpoint() and usb_reset_configuration()
> >> from within the mceusb driver also did not clear stuck HALTs.
> >>
> >> A possible future patch could have the mceusb call usb_reset_device()
> >> on itself, when necessary. Limiting the number of HALT error recovery
> >> attempts may also be necessary to prevent halt/clear-halt loops.
> >>
> >> Unresolved for now, deadlock complications occur if mceusb's worker
> >> thread, mceusb_deferred_kevent(), calls usb_reset_device(),
> >> which calls mceusb_dev_disconnect(), which calls cancel_work_sync(),
> >> which waits on the still active worker thread.
> > 
> > I think you can call usb_lock_device_for_reset() to prevent that problem.
> 
> Deadlock still occurs in test:
> mceusb_deferred_kevent()
>     ->usb_reset_device()
>         ->mceusb_dev_disconnect()
>             ->cancel_work_sync()
> where cancel_work_sync() blocks because mceusb_deferred_kevent() is active.

I understand. The usb_reset_device() does not need to happen synchronously
in mceusb_deferred_kevent(). Possibly another delayed workqueue.

Actually there is also a function usb_queue_reset_device() which might do
what you want here.

> 
> > It would be nice to see this implemented too.
> > 
> 
> Any additional tips to implement would help!
> How to validate and test a rare condition with a larger audience?

This is hard. Do you know the model of the mceusb and host hardware?

> >>
> >> Signed-off-by: A Sun <as10...@comcast.net>
> >> ---
> >>  drivers/media/rc/mceusb.c | 35 +++++++++++++++++++++++++----------
> >>  1 file changed, 25 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/drivers/media/rc/mceusb.c b/drivers/media/rc/mceusb.c
> >> index efffb1795..5d81ccafc 100644
> >> --- a/drivers/media/rc/mceusb.c
> >> +++ b/drivers/media/rc/mceusb.c
> >> @@ -765,7 +765,7 @@ static void mceusb_defer_kevent(struct mceusb_dev *ir, 
> >> int kevent)
> >>  {
> >>    set_bit(kevent, &ir->kevent_flags);
> >>    if (!schedule_work(&ir->kevent))
> >> -          dev_err(ir->dev, "kevent %d may have been dropped", kevent);
> >> +          dev_dbg(ir->dev, "kevent %d already scheduled", kevent);
> >>    else
> >>            dev_dbg(ir->dev, "kevent %d scheduled", kevent);
> >>  }
> >> @@ -1404,19 +1404,26 @@ static void mceusb_deferred_kevent(struct 
> >> work_struct *work)
> >>            container_of(work, struct mceusb_dev, kevent);
> >>    int status;
> >>  
> >> +  dev_info(ir->dev, "kevent handler called (flags 0x%lx)",
> >> +           ir->kevent_flags);
> >> +
> >>    if (test_bit(EVENT_RX_HALT, &ir->kevent_flags)) {
> >>            usb_unlink_urb(ir->urb_in);
> >>            status = usb_clear_halt(ir->usbdev, ir->pipe_in);
> >> +          dev_err(ir->dev, "rx clear halt status = %d", status);
> >>            if (status < 0) {
> >> -                  dev_err(ir->dev, "rx clear halt error %d",
> >> -                          status);
> >> -          }
> >> -          clear_bit(EVENT_RX_HALT, &ir->kevent_flags);
> >> -          if (status == 0) {
> >> +                  /*
> >> +                   * Unable to clear RX stall/halt.
> >> +                   * Will need to call usb_reset_device().
> >> +                   */
> >> +                  dev_err(ir->dev,
> >> +                          "stuck RX HALT state requires USB Reset Device 
> >> to clear");
> > 
> > Here you say if the usb_clear_halt() returns < 0 then we're in the bad
> > code path. But in this code path, we're not re-submitting the urb so
> > we wouldn't end up in an infinite loop.
> > 
> 
> The theory for the infinite loop case is usb_clear_halt() returned 0.
> The infinite loop isn't resolved with this patch.
> 
> > 
> >> +          } else {
> >> +                  clear_bit(EVENT_RX_HALT, &ir->kevent_flags);
> > 
> > Why have you moved this line?
> > 
> 
> Clear bit but only if usb_clear_halt() recovery succeeds.

To what affect? Will it be attempted again? Might it cause an infinite
loop?
> 
> A future usb_device_reset() operation affects both RX and TX and its
> success code path must clear both EVENT_RX_HALT and EVENT_TX_HALT
> and do its own usb_submit_urb() with message "rx reset device submit urb 
> status = %d"
> 
> Hence, the moved line.

That's in a future patch. Please only change error strings in this patch.

Thanks,

Sean

Reply via email to