Hi Fabio,
On 01/23/2015 12:25 AM, Fabio Estevam wrote:
On Thu, Jan 22, 2015 at 7:25 PM, Nikolay Dimitrov <[email protected]> wrote:
I will appreciate if you can share ideas what could be wrong with this
setup, and also I'll be happy to hear from you suggestions for similar
simple tests for system reliability.
Maybe you could try to run the 'memtester' utility and see it how your
board behaves.
Thanks for the idea. I ran the tool and it also reports errors, but
this happens rarely (just like the hash test) and I still looking for
how to easily reproduce the issue. Here's an example of memory error:
# memtester 64M 100
memtester version 4.1.3 (32-bit)
Copyright (C) 2010 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffff000
want 64MB (67108864 bytes)
got 64MB (67108864 bytes), trying mlock ...locked.
Loop 1/100:
Stuck Address : ok
Random Value : ok
FAILURE: 0xc3909006 != 0xc3909007 at offset 0x00291fac.
Compare XOR : Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
Memtester can run for hours without finding an issue, and sometimes it
runs for several minutes and reports a memory error.
Found another tool, stresstestapp (http://stressapptest.googlecode.com
/svn/trunk/) which again seems to trigger the issue. Here's again an
example of memory error:
# ./stressapptest --no_timestamps --printsec 60 -M 64 -s 300
Log: Commandline - ./stressapptest --no_timestamps --printsec 60 -M 64
-s 300
Stats: SAT revision 1.0.7_autoconf, 32 bit binary
Log: picmaster @ riotboard on Fri Jan 23 20:48:49 EET 2015 from open
source release
Log: 1 nodes, 2 cpus.
Log: Defaulting to 2 copy threads
Log: Flooring memory allocation to multiple of 4: 64MB
Log: Prefer plain malloc memory allocation.
Log: Using mmap() allocation at 0x72430000.
Stats: Starting SAT, 64M, 300 seconds
Log: region number 1 exceeds region count 1
Log: Region mask: 0x1
Log: Seconds remaining: 240
Log: Seconds remaining: 180
Report Error: miscompare : DIMM Unknown : 1 : 134s
Hardware Error: miscompare on CPU 1(0x2) at 0x74e93040(0x33f0d040:DIMM
Unknown): read:0xaaaaaaaaaaaaaa8a, reread:0xaaaaaaaaaaaaaa8a
expected:0xaaaaaaaaaaaaaaaa
Report Error: miscompare : DIMM Unknown : 1 : 136s
Hardware Error: miscompare on CPU 0(0x1) at 0x75528710(0x32270710:DIMM
Unknown): read:0xffffffbfffffffbe, reread:0xffffffbfffffffbe
expected:0xffffffbfffffffbf
Log: Seconds remaining: 120
Log: Seconds remaining: 60
Report Error: miscompare : DIMM Unknown : 1 : 266s
Hardware Error: miscompare on CPU 0(0x1) at 0x74b979d0(0x358ae9d0:DIMM
Unknown): read:0x0000001000000000, reread:0x0000001000000000
expected:0x0000001000000010
Report Error: miscompare : DIMM Unknown : 1 : 274s
Hardware Error: miscompare on CPU 0(0x1) at 0x73b4cfd0(0x35e8afd0:DIMM
Unknown): read:0x0000001000000000, reread:0x0000001000000000
expected:0x0000001000000010
Log: Thread 1 found 3 hardware incidents
Log: Thread 2 found 1 hardware incidents
Stats: Found 4 hardware incidents
Stats: Completed: 256346.00M in 300.03s 854.40MB/s, with 4 hardware
incidents, 0 errors
Stats: Memory Copy: 256346.00M at 854.46MB/s
Stats: File Copy: 0.00M at 0.00MB/s
Stats: Net Copy: 0.00M at 0.00MB/s
Stats: Data Check: 0.00M at 0.00MB/s
Stats: Invert Data: 0.00M at 0.00MB/s
Stats: Disk: 0.00M at 0.00MB/s
Status: FAIL - test discovered HW problems
I plan to run again the FSL DDR stress test to see whether it
detects issues with my DDR memory. My board uses a SO-DIMM DDR3, and I
was also thinking to try with another SO-DIMM module to see whether
there's any difference.
Thanks for the ideas so far. This is a major problem for me so I need
to resolve it before doing anything else on this board.
Kind regards,
Nikolay
--
_______________________________________________
meta-freescale mailing list
[email protected]
https://lists.yoctoproject.org/listinfo/meta-freescale