On Mon, Sep 21, 2015 at 06:56:34PM +0530, [email protected] wrote: > On 15-09-21 14:50:18, Peter Chen wrote: > > On Fri, Sep 18, 2015 at 04:01:50PM +0530, [email protected] wrote: > > > On 15-09-18 13:39:11, Peter Chen wrote: > > > > On Wed, Sep 16, 2015 at 02:48:50PM +0530, [email protected] > > > > wrote: > > > > > On 15-09-16 15:54:21, Peter Chen wrote: > > > > > > On Wed, Sep 16, 2015 at 02:18:49PM +0530, [email protected] > > > > > > wrote: > > > > > > > Hello Peter, > > > > > > > > > > > > > > > > > > > > > > > Enable CONFIG_DEBUG_LIST, it has below position if you > > > > > > > > run make menuconfig > > > > > > > > Kernel hacking ---> > > > > > > > > [*] Debug linked list manipulation > > > > > > > > > > > > > > > > > > > > > > Sorry for the delay. When I enabled this config the first time my > > > > > > > test > > > > > > > application ran for 24 hours or so and I did not get any stack > > > > > > > traces. > > > > > > > > > > > > > > I restarted the test again and finally got the trace below. You > > > > > > > were > > > > > > > spot on, its a list corruption issue. I modified the trace a bit > > > > > > > after > > > > > > > copying to remove the sprinkled debug messages throughout the > > > > > > > trace > > > > > > > from my test application. > > > > > > > > > > > > > > [ 622.204134] WARNING: CPU: 0 PID: 0 at lib/list_debug.c:59 > > > > > > > __list_del_entry+0xc4/0xe8() > > > > > > > [ 622.212870] list_del corruption. prev->next should be > > > > > > > 8db63600, but was 36008db6 > > > > > > > > > > > > You see the higher 16 bits were swapped with lower 16 bits, and the > > > > > > virtual memory address should begin from 0x8xxxxxxxx, right? > > > > > > > > > > Yes, I saw that but beats me how this happens. > > > > > > > > > > > > > > > > > Check with Vybrid errata to see if all ARM/memory system have > > > > > > applied. > > > > > > > > > > What do you mean by "all ARM/memory system have applied" ? I checked > > > > > with the Vybrid errata > > > > > and I do not see anything related. > > > > > > > > > > > > > Just system level errata, like ARM Cortex A5, memory (L1/L2 Cache), etc. > > > > > > > > Would you please do more tests to see if the error pattern is always > > > > the same? > > > > > > I got more or less the same logs as below the last five times I tried > > > today > > > and this time I got the crashes quickly enough somehow. Did not have to > > > wait > > > for more than half an hour. > > > > > > > And print the address to store prev-next. > > > > > > Isn't that what's given by list_del corruption info? > > > > It only prints the content of prev->next, not without the address of > > prev->next, I just want to make sure this address is dword aligned. > > Ok. > > > > > [ 476.880749] list_del corruption. prev->next should be 8daf74c0, but was > > 74c08daf > > > > > > > > Interesting that atleast one more person Felipe Tonello sees the same > > > issue. > > > > > > Felipe mentions a DMA issue, I saw a DMA error message from ci_hdrc once > > > in the > > > last five times I tried but mistakenly I did not take that one down. The > > > message > > > was something along the lines "ci_hdrc: ci_hdrc bad dma alloc" or similar. > > > > Make sure you really see dma_pool_alloc fail or not, it may not the same > > problem > > That message was exactly > > [ 1186.114496] ci_hdrc ci_hdrc.0: dma_pool_free ci_hw_td, (null)/8d3c1e6c > (bad dma) >
Does above message occur just close to linked list corruption? Or it is during the correct transfer process? -- Best Regards, Peter Chen -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
