Sorry to post in this huge email bunch. I have most probably hit an errata in Freescale T4240 for PPC_DISABLE_THREADS. I'm using Rev2 - T4240. Is this Errata required to be taken care or not? Any quick help is appreciated!
My issue: I'm running line rate of Traffic to T4240 [10G of traffic on each port of capacity of 10G]; After few hours of running of traffic, My CPU/LMP gets stuck and goes in for Hard reset. When I searched through the code and some open forums, I saw the errata listed above. I'm not sure if that errata works for me or not. I have pasted a snapshot of issue occuring in my system ============================================ During hang, with Softlock up enabled, I get prints from smp_many -> showing ‘every processor is waiting for processor 22’ ; I need to know what happened to processor 22. The processor number keep changing every time when I run the traffic. Processor 22 goes into a deadlock state with interrupts disabled or it went into a deep idle sleep state is the issue i feel. If its deep idle state -> a NMI should have recovered it. But if it’s a deadlock issue with interrupt disabled then I need to know the root cause for the deadlock. *root@A@0-1-1:~# [ 1602.261011] Current::17 waiting::22 flag::4353 [ 1602.720488] Current::14 waiting::22 flag::3585 [ 1602.756404] Current::15 waiting::22 flag::3841 [ 1604.903248] Current::16 waiting::22 flag::4097 [ 1619.400917] Current::14 waiting::22 flag::3585 [ 1619.517870] Current::17 waiting::22 flag::4353 [ 1619.749893] Current::15 waiting::22 flag::3841 [ 1619.798453] Current::4 waiting::22 flag::1025 [ 1622.177171] Current::16 waiting::22 flag::4097 [ 1622.449412] INFO: rcu_preempt detected stalls on CPUs/tasks: { 22} (detected by 4, t=21008 jiffies, g=101651, c=101650, q=4713) [ 1622.460951] Task dump for CPU 22:* *[ 1622.464275] swapper/22 R running task 0 0 1 0x00000800* *[ 1622.471355] Call Trace:* *[ 1622.473820] [c0000001f92b78d0] [000000000000009e] 0x9e (unreliable) [ 1622.480119] [c0000001f92b7960] [c0000001f92b7aa0] 0xc0000001f92b7aa0 [ 1622.486497] [c0000001f92b79e0] [c000000000a90275] 0xc000000000a90275 [ 1622.492878] [c0000001f92b7a60] [c000000000006f64] .do_IRQ+0x184/0x370 [ 1622.499344] [c0000001f92b7b10] [c00000000001b93c] exc_0x500_common+0xfc/0x100 [ 1622.506539] --- Exception: 501 at 0xc000000000a3e200* *[ 1622.506539] LR = .__check_irq_replay+0x68/0x110* *[ 1622.516388] [c0000001f92b7e00] [c0000000000bc590] .cpu_startup_entry+0x1d0/0x350 (unreliable) [ 1622.524950] [c0000001f92b7ed0] [c000000000a00460] .start_secondary+0x3ec/0x3f4 [ 1622.532197] [c0000001f92b7f90] [c00000000000036c] .start_secondary_prolog+0x10/0x14 [ 1635.237829] Current::3 waiting::5 flag::769 [ 1636.587746] Current::14 waiting::22 flag::3585 [ 1637.082117] Current::17 waiting::22 flag::4353 [ 1637.224920] Current::4 waiting::22 flag::1025 [ 1637.268789] Current::15 waiting::22 flag::3841 [ 1640.085792] Current::16 waiting::22 flag::4097 [ 1651.093367] Current::4 waiting::22 flag::1025* *=============================================================* Regards, Kiran On Thu, May 5, 2016 at 4:40 PM, Arnd Bergmann <a...@arndb.de> wrote: > On Thursday 05 May 2016 09:41:32 Yangbo Lu wrote: > > > -----Original Message----- > > > From: Arnd Bergmann [mailto:a...@arndb.de] > > > Sent: Thursday, May 05, 2016 4:32 PM > > > To: linuxppc-dev@lists.ozlabs.org > > > Cc: Yangbo Lu; linux-...@vger.kernel.org; devicet...@vger.kernel.org; > > > linux-arm-ker...@lists.infradead.org; linux-ker...@vger.kernel.org; > > > linux-...@vger.kernel.org; linux-...@vger.kernel.org; > iommu@lists.linux- > > > foundation.org; net...@vger.kernel.org; Mark Rutland; > > > ulf.hans...@linaro.org; Russell King; Bhupesh Sharma; Joerg Roedel; > > > Santosh Shilimkar; Yang-Leo Li; Scott Wood; Rob Herring; Claudiu > Manoil; > > > Kumar Gala; Xiaobo Xie; Qiang Zhao > > > Subject: Re: [v10, 7/7] mmc: sdhci-of-esdhc: fix host version for > T4240- > > > R1.0-R2.0 > > > > > > On Thursday 05 May 2016 11:12:30 Yangbo Lu wrote: > > > > > > > > + fsl_guts_init(); > > > > + svr = fsl_guts_get_svr(); > > > > + if (svr) { > > > > + esdhc->soc_ver = SVR_SOC_VER(svr); > > > > + esdhc->soc_rev = SVR_REV(svr); > > > > + } else { > > > > + dev_err(&pdev->dev, "Failed to get SVR value!\n"); > > > > + } > > > > + > > > > > > > > > > > > > Sorry for jumping in again after not participating in the discussion > for > > > the past few versions. > > > > > > What happened to my suggestion of making this a platform-independent > > > interface to avoid the link time dependency? > > > > > > Specifically, why not add an exported function to drivers/base/soc.c > that > > > uses glob_match() for comparing a string in the device driver to the ID > > > of the SoC that is set by whatever SoC identifying driver the platform > > > has? > > > > [Lu Yangbo-B47093] I think this has been discussed in v6. > > You can find Scott's comments about this in below link. > > https://patchwork.kernel.org/patch/8544501/ > > Ah, thanks for bearing with me and digging this out again. Let me follow > up on Scott's older replies here then: > > > >> IIRC, it is the same IP block as i.MX and Arnd's point is this won't > > >> even compile on !PPC. It is things like this that prevent sharing the > > >> driver. > > > > The whole point of using the MMIO SVR instead of the PPC SPR is so that > > it will work on ARM... The guts driver should build on any platform as > > long as OF is enabled, and if it doesn't find a node to bind to it will > > return 0 for SVR, and the eSDHC driver will continue (after printing an > > error that should be removed) without the ability to test for errata > > based on SVR. > > It feels like a bad design to have to come up with a different > method for each SoC type here when they all do the same thing > and want to identify some variant of the chip to do device > specific quirks. > > As far as I'm concerned, every driver in drivers/soc that needs to > export a symbol to be used by a device driver is an indication that > we don't have the right set of abstractions yet. There are cases > that are not worth abstracting because the functionality is rather > obscure and only a couple of drivers for one particular chip > ever need it. > > Finding out the version of the SoC does not look like this case. > > > > I think the first four patches take care of building for ARM, > > > but the problem remains if you want to enable COMPILE_TEST as > > > we need for certain automated checking. > > > > What specific problem is there with COMPILE_TEST? > > COMPILE_TEST is solvable here and the way it is implemented in this > case (selecting FSL_GUTS from the driver) indeed looks like it works > correctly, but it's still awkward that this means building the > SoC specific ID stuff into the vmlinux binary for any driver that > uses something like that for a particular SoC. > > > >> Dealing with Si revs is a common problem. We should have a > > >> common solution. There is soc_device for this purpose. > > > > > > Exactly. The last time this came up, I think we agreed to implement a > > > helper using glob_match() on the soc_device strings. Unfortunately > > > this hasn't happened then, but I'd still prefer that over yet another > > > vendor-specific way of dealing with the generic issue. > > > > soc_device would require encoding the SVR as a string and then decoding > > the string, which is more complicated and error prone than having > > platform-specific code test a platform-specific number. > > You already need to encode it as a string to register the soc_device, > and the driver just needs to pass a glob string, so the only part that > is missing is the generic function that takes the string from the > driver and passes that to glob_match for the soc_device. > > > And when would it get registered on arm64, which doesn't have > > platform code? > > Whenever the soc driver is loaded, as is the case now. The match > function can return -EPROBE_DEFER if no SoC device is registered > yet. > > Arnd > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev >