Honestly, I haven't spent much time to investigate how the curlen = 0 is
getting generated exactly, because for me it will be very difficult to
understand that without the hardware on my side re-producing the same.

But I had look when the code was introduced to handle curlen == 0 later
in the function:

if (iscontrol) {
        /*
         * adjust the toggle based on the number of packets
         * in this qtd
         */
        if ((((curlen + mps - 1) / mps) & 1) || curlen == 0)
                qtdstatus ^= EHCI_QTD_TOGGLE_MASK;
}

This was introduced by revision 1.57 of ehci.c 14 years ago:

***

If a zero-length bulk or interrupt transfer is requested then assume
USBD_FORCE_SHORT_XFER to ensure that we actually build and execute
a transfer.

Based on changes in FreeBSD rev1.47

***

While the DIAGNOSTIC code to panic at curlen == 0 was introduced with
the first commit of ehci.c.  I think the revision 1.57 should have
removed that DIAGNOSTIC code already, since we obviously can cope
with curlen = 0.

Given that, your below diff would be OK for me.

On Fri, Oct 23, 2020 at 10:09:14AM +0000, Mikolaj Kucharski wrote:

> On Sat, Sep 26, 2020 at 11:00:46PM +0000, Mikolaj Kucharski wrote:
> > On Wed, Sep 16, 2020 at 09:19:37PM +0000, Mikolaj Kucharski wrote:
> > > So time of the crash varies and I guess probably there is no pattern
> > > there. Here is new panic report, with some kernel printf()s added.
> > > 
> > > Problem seems to be related that dataphyspage is greater than
> > > dataphyslastpage: lastpage=0xa0e0000 page=0xa0e1000 phys=0xa0e1000.
> > > The same is observable in all those previous panics.
> > > 
> > 
> > Here ia nother analysis, without my patches, I think they may be
> > confusing. I will show flow of ehci_alloc_sqtd_chain() with values
> > of variables which I collected via my debugging effort.
> > 
> > I think I have understanding what is the flow, but not sure what
> > code should do in those circumstances.
> > 
> 
> I didn't get any feedback on this so far. I'm running custom kernel
> with additional printf()'s to help me understand the flow.
> 
> I ended up with following diff, which I think is safe to do,
> as code handles situation when curlen == 0 further down in
> ehci_alloc_sqtd_chain() function. My machine runs more than
> two weeks with below panic() removed and condition of panic
> is triggered multiple times during uptime and I didn't notice
> any side effects.
> 
> However, flow of ehci_alloc_sqtd_chain() is hard to follow. Well, for me
> anyway. At this stage I don't know how to improve things, beyond below
> diff.
> 
> Any feedback would be appreciated. I don't like running custom kernels,
> so I would like to drop below M from my source checkout.
> 
> 
> Index: ehci.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/usb/ehci.c,v
> retrieving revision 1.211
> diff -u -p -u -r1.211 ehci.c
> --- ehci.c    6 Aug 2020 14:06:12 -0000       1.211
> +++ ehci.c    23 Oct 2020 10:00:35 -0000
> @@ -2407,10 +2407,6 @@ ehci_alloc_sqtd_chain(struct ehci_softc 
>                       curlen -= curlen % mps;
>                       DPRINTFN(1,("ehci_alloc_sqtd_chain: multiple QTDs, "
>                           "curlen=%u\n", curlen));
> -#ifdef DIAGNOSTIC
> -                     if (curlen == 0)
> -                             panic("ehci_alloc_std: curlen == 0");
> -#endif
>               }
>  
>               DPRINTFN(4,("ehci_alloc_sqtd_chain: dataphys=0x%08x "
> 
> -- 
> Regards,
>  Mikolaj
> 

Reply via email to