On 2014-07-01 17:33, O. Hartmann wrote:
Am Tue, 01 Jul 2014 17:23:14 +0200
Willem Jan Withagen <w...@digiware.nl> schrieb:

On 2014-07-01 16:48, Rang, Anton wrote:
DOT => DOD

444F54 => 444F44

That's a single-bit flip.  Bad memory, perhaps?

Very likely, especially if the system does not have ECC....
It just happens on rare occasions that a alpha particle, power cycle, or
any things else disruptive damages a memory cell. And it could be that
it requires a special pattern of accesses to actually exhibit the error.

In the past (199x's) 'make buildworld' used to be a rather good memory
tester. But nowadays look at
        http://www.memtest.org/

This tool has found all of the bad memory in all the systems I used and
or build for others...
Note that it might take a few runs and some more heat to actually
trigger the faulty cell, but memtest86 will usually find it.

Note that on big systems with lots of memory it can take a loooooong
time to run just one full testset to completion.

--WjW

I already testet via memtest86+ (had to download the linux image, the port on 
FreeBSD is
broken on CURRENT). It didn't find anything strange so far.

I will do another test.

I realised, that on that that specific box, the chipset temperature is 81 Grad 
Celius.
The chipset is a Eaglelake P45 - in which the memory controller resides on that 
old
platform. dmidecode gives:

         Manufacturer: ASUSTeK Computer INC.
         Product Name: P5Q-WS
         Version: Rev 1.xx

Hi Oliver,

I've build several (5+) systems with these boards (from memory they date around 2009??). And if I recall right, one of them is still functional. The first one broke down in a couple of weeks, and the other did not survive time either.

The auxiliary chips on that board do run hot, but I never realized this hot. Is 81C is the CPU temp from sysctl, or did you measure the cooling body on the motherboard. In the later case it is just too hot, probably. But even if it is the temp on the chip itself, I've rrarely seen temps go up this high.

You can need to run the memtest86 for more than 6-10 complete runs with all the tests.

If the memtests do not reveal anything broken, then you get into even more wizardry stuff, like bad power etc... Especially since it only occurs on occasion, it is going to be a nightmare to find the root cause of this. Other than replacing hardware piece by piece, which won't be easy given the age of the board and parts.

You could go into the bios, and try to config ram access at a slower speed and see if the problem goes away. Then it could be that you are running an the edge of the spec with regards to ram timing.

But like I said, it is all lots of funky details that can interact in strange and unexpected ways.

--WjW

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to