When I was tracking down a timing problem on our SDRAM I found that doing a native compile of glibc over NFS seems to be a very good memory test.
> -----Original Message----- > From: linuxppc-embedded-bounces+runet=innovsys.com at ozlabs.org > [mailto:linuxppc-embedded-bounces+runet=innovsys.com at ozlabs.or > g] On Behalf Of Kenneth Poole > Sent: Wednesday, April 19, 2006 07:59 > To: linuxppc-embedded at ozlabs.org > Subject: kernel access of bad area, sig: 11 ( mpc852t) > > > >>> Hi, > > >>> Im having problem porting linux kernel 2.4.21 to our > mpc852T custom > > >>> board.The kernel > > >>> panics randomly with sig 11. > > >>> The board boots up fine and we also get to the > prompt.When we open 3-4 > > >>> telnet sessions > > >>> and try to run some command the kernel panics.This is completely > > >>> random.Sometimes it > > >>> even panics before opening the telnet session. > > >>> > > > >>> <oops dump snipped> > > >>> > > >>You almost certainly have SDRAM problems. If you have > thoroughly checked > > >>out the > > >>complete address range statically, remember that burst > accesses will not > > >>occur until the > > >>cache is turned on, so your problem may be with bursting. > But you can also > > >>have severe > > >>problems like a missing address line and linux still run > for a few seconds. > > >> > > >>Mark Chambers > > >We've checked the SDRAM. The timings (UPM) look fine. The problem > > >however is that linux does not hang until after a few processes are > > >started. > > >If we boot to linux and leave it as it is, everything is fine and the > > >board remains working. However each time a few processes (4-5 telnet > > >sessions for eg.) are started the system either panics or hangs (goes > > >dead). > > >Thanks in advance, > > >Akshay > > We have been experiencing this same issue with random boards > in production. The exact same version of software will run > for months on other instances of the exact same board design, > but a few percent get 'random' trap 300s. When they do occur, > it's only after Linux has booted and address translation and > caching are turned on. Examining the oops-es and memory shows > that some location in SDRAM has a bogus value, but I don't > have the tools to trace back how it got that way. > > I have ported a rigorous moving-inversions memory test into > our firmware, and have run it extensively across the entire > SDRAM address space (the test code executes from flash). I > have let this test run continuously for hours and hours, but > never found a memory problem. Unfortunately, I do not have > test software that enables the MMU address translation or > caching, so as Mark said, I can't test memory using bursting. > Our hardware engineers have reviewed the designs very > carefully and are quite confident that there is plenty of > margin in the memory timing. Signal quality has also been > carefully checked. > > Our manufacturing people have replaced the CPU on some of > these boards, and the problem went away. > > If anyone else on the mailing list has experienced this > issue, or has developed a virtual address memory test, please > let us know. > > Ken Poole > > > > > >