Mathias Nyman <[email protected]> writes:
> The retry was a workaround for a case triggered on v4.1.4 by
> Vincent Pelletier (added to CC)
>
> http://marc.info/?l=linux-usb&m=144031185222108&w=2
>
> The race could very well explain that issue.
>From logs of that thread, it would has possibility about this race.
>> @@ -320,15 +367,29 @@ static int xhci_abort_cmd_ring(struct xh
>> udelay(1000);
>> ret = xhci_handshake(&xhci->op_regs->cmd_ring,
>> CMD_RING_RUNNING, 0, 3 * 1000 * 1000);
>> - if (ret == 0)
>> - return 0;
>> + if (ret < 0) {
>> + xhci_err(xhci, "Stopped the command ring failed, "
>> + "maybe the host is dead\n");
>> + xhci->xhc_state |= XHCI_STATE_DYING;
>> + xhci_halt(xhci);
>> + return -ESHUTDOWN;
>> + }
>> + }
>
> So if we can verify that the race fix solves the old issue for Vincent
> Pelletier then we could get rid of the abort retry above as well.
Right. However, I'm not sure (and maybe you neither). So I didn't
remove it in this patchset.
(After this patchset, I guess the retry will hit only when actual chip
internal confusion. I.e. retry will fail, then will chip halted.)
Well, anyway, if you decide to try to remove the retry, I also think it
would worth to try. However, it would be [PATCH 4/4] or as another
patchset.
>> + /*
>> + * Writing the CMD_RING_ABORT bit should cause a cmd completion event,
>> + * however on some host hw the CMD_RING_RUNNING bit is correctly cleared
>> + * but the completion event in never sent. Wait 2 secs (arbitrary
>> + * number) to handle those cases after negation of CMD_RING_RUNNING.
>> + */
>
> I'm speculating a bit here, but as we now can sleep, and if we could
> get rid of the abort retry couldn't we swap the order of the
> xhci_handshake(CMD_RING_RUNNING) busywait and
> wait_for_completion_timeout(). We could also use a standard command
> timeout time while waiting for the completion
>
> something like:
>
> hci_write_64(CMD_RING_ABORT)
> timed_out = wait_for_completion_timeout(XHCI_CMD_DEFAULT_TIMEOUT)
> xhci_handshake(CMD_RING_RUNNING)
> if (handshake fail) {
> xhci_halt(), etc..
> return -ESHUTDOWN
> }
> if (timed out) {
> xhci_cleanup_command_queue(xhci);
> return
> }
>
> It would reduce the time we spend in the xhci_handshake() busyloop
BTW, busyloop removal was done by "[PATCH 3/3] ...". By [PATCH 3/3], we
can simply use straight forward coding without busyloop.
>> diff -puN drivers/usb/host/xhci.h~xhci-fix-abort-race2
>> drivers/usb/host/xhci.h
>> --- xhci/drivers/usb/host/xhci.h~xhci-fix-abort-race2 2016-11-16
>> 18:38:52.552146764 +0900
>> +++ xhci-hirofumi/drivers/usb/host/xhci.h 2016-11-16 18:38:52.554146762
>> +0900
>> @@ -1574,6 +1574,7 @@ struct xhci_hcd {
>> struct list_head cmd_list;
>> unsigned int cmd_ring_reserved_trbs;
>> struct delayed_work cmd_timer;
>> + struct completion stop_completion;
>
> Tiny thing, but maybe change stop_completion to
> cmd_ring_stop_completion. to avoid mixing it with stopping host
> controller, stop endpoint, stop device etc
OK.
--
OGAWA Hirofumi <[email protected]>
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html