Typing Greg's email correct this time.  My apologies.

Eugene 

-----Original Message-----
From: Eugene Bordenkircher 
Sent: Friday, October 29, 2021 10:14 AM
To: linux-...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
Cc: leoyang...@nxp.com; ba...@kernel.org; gre...@linuxfoundataion.org
Subject: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to 
unrecoverable loop.

Hello all,

We've discovered a situation where the FSL udc driver 
(drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating over the 
request queue, but the queue has been corrupted at some point so it loops 
infinitely.  I believe we have narrowed into the offending code, but we are in 
need of assistance trying to find an appropriate fix for the problem.  The 
identified code appears to be in all versions of the Linux kernel the driver 
exists in.

The problem appears to be when handling a USB_REQ_GET_STATUS request.  The 
driver gets this request and then calls the ch9getstatus() function.  In this 
function, it starts a request by "borrowing" the per device status_req, filling 
it in, and then queuing it with a call to list_add_tail() to add the request to 
the endpoint queue.  Right before it exits the function however, it's calling 
ep0_prime_status(), which is filling out that same status_req structure and 
then queuing it with another call to list_add_tail() to add the request to the 
endpoint queue.  This adds two instances of the exact same LIST_HEAD to the 
endpoint queue, which breaks the list since the prev and next pointers end up 
pointing to the wrong things.  This ends up causing a hard loop the next time 
nuke() gets called, which happens on the next setup IRQ.

I'm not sure what the appropriate fix to this problem is, mostly due to my lack 
of expertise in USB and this driver stack.  The code has been this way in the 
kernel for a very long time, which suggests that it has been working, unless 
USB_REQ_GET_STATUS requests are never made.  This further suggests that there 
is something else going on that I don't understand.  Deleting the call to 
ep0_prime_status() and the following ep0stall() call appears, on the surface, 
to get the device working again, but may have side effects that I'm not seeing.

I'm hopeful someone in the community can help provide some information on what 
I may be missing or help come up with a solution to the problem.  A big thank 
you to anyone who would like to help out.

Eugene

Reply via email to