On Mon, 2021-11-29 at 23:48 +0000, Eugene Bordenkircher wrote: > Agreed, > > We are happy pick up the torch on this, but I'd like to try and hear from > Joakim first before we do. The patch set is his, so I'd like to give him the > opportunity. I think he's the only one that can add a truly proper > description as well because he mentioned that this includes a "few more > fixes" than just the one we ran into. I'd rather hear from him than try to > reverse engineer what was being addressed. > > Joakim, if you are still watching the thread, would you like to take a stab > at it? If I don't hear from you in a couple days, we'll pick up the torch > and do what we can. >
I am far away from this now and still on 4.19. I don't mind if you tweak tweak the patches for better "upstreamability" Regards Joakim > Eugene T. Bordenkircher > > -----Original Message----- > From: Leo Li <leoyang...@nxp.com> > Sent: Monday, November 29, 2021 3:37 PM > To: Eugene Bordenkircher <eugene_bordenkirc...@selinc.com>; Thorsten Leemhuis > <regressi...@leemhuis.info>; jo...@infinera.com > <joakim.tjernl...@infinera.com>; linuxppc-dev@lists.ozlabs.org; > linux-...@vger.kernel.org > Cc: gre...@linuxfoundation.org; ba...@kernel.org > Subject: RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to > unrecoverable loop. > > [Caution - External] > > > -----Original Message----- > > From: Eugene Bordenkircher <eugene_bordenkirc...@selinc.com> > > Sent: Monday, November 29, 2021 11:25 AM > > To: Thorsten Leemhuis <regressi...@leemhuis.info>; jo...@infinera.com > > <joakim.tjernl...@infinera.com>; linuxppc-dev@lists.ozlabs.org; linux- > > u...@vger.kernel.org > > Cc: Leo Li <leoyang...@nxp.com>; gre...@linuxfoundation.org; > > ba...@kernel.org > > Subject: RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list > > leads to unrecoverable loop. > > > > The final result of our testing is that the patch set posted seems to > > address all known defects in the Linux kernel. The mentioned > > additional problems are entirely caused by the antivirus solution on > > the windows box. The antivirus solution blocks the disconnect > > messages from reaching the RNDIS driver so it has no idea the USB > > device went away. There is nothing we can do to address this in the Linux > > kernel. > > Thanks for the confirmation. > > > > > I propose we move forward with the patchset. > > I think that we should proceed to merge the patchset but it seems to need > some cleanup for coding style issues and better description before submitted > formally. > > > > > Eugene T. Bordenkircher > > > > -----Original Message----- > > From: Thorsten Leemhuis <regressi...@leemhuis.info> > > Sent: Thursday, November 25, 2021 5:59 AM > > To: Eugene Bordenkircher <eugene_bordenkirc...@selinc.com>; Thorsten > > Leemhuis <regressi...@leemhuis.info>; Joakim Tjernlund > > <joakim.tjernl...@infinera.com>; linuxppc-dev@lists.ozlabs.org; linux- > > u...@vger.kernel.org > > Cc: leoyang...@nxp.com; gre...@linuxfoundation.org; ba...@kernel.org > > Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list > > leads to unrecoverable loop. > > > > Hi, this is your Linux kernel regression tracker speaking. > > > > Top-posting for once, to make this easy to process for everyone: > > > > Li Yang and Felipe Balbi: how to move on with this? It's quite an old > > regression, but nevertheless it is one and thus should be fixed. Part > > of my position is to make that happen and thus remind developers and > > maintainers about this until the regression is resolved. > > > > Ciao, Thorsten > > > > On 16.11.21 20:11, Eugene Bordenkircher wrote: > > > On 02.11.21 22:15, Joakim Tjernlund wrote: > > > > On Sat, 2021-10-30 at 14:20 +0000, Joakim Tjernlund wrote: > > > > > On Fri, 2021-10-29 at 17:14 +0000, Eugene Bordenkircher wrote: > > > > > > > > > > We've discovered a situation where the FSL udc driver > > (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating > > over the request queue, but the queue has been corrupted at some point > > so it loops infinitely. I believe we have narrowed into the offending > > code, but we are in need of assistance trying to find an appropriate > > fix for the problem. The identified code appears to be in all > > versions of the Linux kernel the driver exists in. > > > > > > > > > > > > The problem appears to be when handling a USB_REQ_GET_STATUS > > request. The driver gets this request and then calls the > > ch9getstatus() function. In this function, it starts a request by > > "borrowing" the per device status_req, filling it in, and then queuing > > it with a call to list_add_tail() to add the request to the endpoint > > queue. Right before it exits the function however, it's calling > > ep0_prime_status(), which is filling out that same status_req > > structure and then queuing it with another call to list_add_tail() to > > add the request to the endpoint queue. This adds two instances of the > > exact same LIST_HEAD to the endpoint queue, which breaks the list > > since the prev and next pointers end up pointing to the wrong things. > > This ends up causing a hard loop the next time nuke() gets called, which > > happens on the next setup IRQ. > > > > > > > > > > > > I'm not sure what the appropriate fix to this problem is, mostly > > > > > > due to > > my lack of expertise in USB and this driver stack. The code has been > > this way in the kernel for a very long time, which suggests that it > > has been working, unless USB_REQ_GET_STATUS requests are never made. > > This further suggests that there is something else going on that I don't > > understand. > > Deleting the call to ep0_prime_status() and the following ep0stall() > > call appears, on the surface, to get the device working again, but may > > have side effects that I'm not seeing. > > > > > > > > > > > > I'm hopeful someone in the community can help provide some > > information on what I may be missing or help come up with a solution > > to the problem. A big thank you to anyone who would like to help out. > > > > > > > > > > Run into this to a while ago. Found the bug and a few more fixes. > > > > > This is against 4.19 so you may have to tweak them a bit. > > > > > Feel free to upstream them. > > > > > > > > Curious, did my patches help? Good to known once we upgrade as well. > > > > > > There's good news and bad news. > > > > > > The good news is that this appears to stop the driver from entering > > > an infinite loop, which prevents the Linux system from locking up > > > and never recovering. So I'm willing to say we've made the behavior > > > better. > > > > > > The bad news is that once we get past this point, there is new bad > > > behavior. What is on top of this driver in our system is the RNDIS > > > gadget driver communicating to a Laptop running Win10 -1809. > > > Everything appears to work fine with the Linux system until there is > > > a USB disconnect. After the disconnect, the Linux side appears to > > > continue on just fine, but the Windows side doesn't seem to > > > recognize the disconnect, which causes the USB driver on that side > > > to hang forever and eventually blue screen the box. This doesn't happen > > > on > > > all machines, just a select few. I think we can isolate the > > > behavior to a specific antivirus/security software driver that is > > > inserting itself into the USB stack and filtering the disconnect > > > message, but we're still proving that. > > > > > > I'm about 90% certain this is a different problem and we can call > > > this patchset good, at least for our test setup. My only hesitation > > > is if the Linux side is sending a set of responses that are > > > confusing the Windows side (specifically this antivirus) or not. > > > I'd be content calling that a separate defect though and letting > > > this one close up with that patchset. > > > > P.S.: As a Linux kernel regression tracker I'm getting a lot of > > reports on my table. I can only look briefly into most of them. > > Unfortunately therefore I sometimes will get things wrong or miss something > > important. > > I hope that's not the case here; if you think it is, don't hesitate to > > tell me about it in a public reply. That's in everyone's interest, as > > what I wrote above might be misleading to everyone reading this; any > > suggestion I gave they thus might sent someone reading this down the > > wrong rabbit hole, which none of us wants. > > > > BTW, I have no personal interest in this issue, which is tracked using > > regzbot, my Linux kernel regression tracking bot > > (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Feur01.safelinks.protection.outloo&data=04%7C01%7Cjoakim.tjernlund%40infinera.com%7Cb302ff817a8f4b3184c408d9b392bd1c%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637738265108962168%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=D2TOGHLaeLnnmbJQG5VEY3CQ66GKtkpBOkFZ16WeW%2F4%3D&reserved=0 > > k.com/?url=https*3A*2F*2Furld__;JSUl!!O7uE89YCNVw!a6nsIMfn544OIzmshw3H > > bMBVcbwor4cV2Q5OsST7-86jy_YZKvDsN-558Ris4wh8Zawz4puN$ > > efense.com%2Fv3%2F__https%3A%2F%2Flinux- > > regtracking.leemhuis.info%2Fregzbot%2F__%3B!!O7uE89YCNVw!aHa5_mLM > > nBeDjINlAtV19tBHm- > > He9jbusXucMA5h7oonHvNFwYpOHAaaqqewPOuGK9HAzJUz%24&data > > =04%7C01%7Cleoyang.li%40nxp.com%7C859ce1560a7344729cea08d9b35d2e > > 67%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6377380350721308 > > 84%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM > > zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ONQZyAKXNgok > > 6LgYvnaAL7LVY%2B5Wl7pXglZDqWUJZMc%3D&reserved=0 ). I'm only > > posting this mail to get things rolling again and hence don't need to > > be CC on all further activities wrt to this regression. > > > > #regzbot title: usb: fsl_udc_core: corrupted request list leads to > > unrecoverable loop