On Mon, 23 Aug 2010 14:20:35 +0200, John Baldwin <[email protected]> wrote:
On Monday, August 23, 2010 2:44:38 am Andriy Gapon wrote:
on 23/08/2010 05:05 Dan Langille said the following:
> On 8/22/2010 9:18 PM, Dan Langille wrote:
>> What does this mean?
>>
>> kernel: MCA: Bank 4, Status 0x940c4001fe080813
>> kernel: MCA: Global Cap 0x0000000000000105, Status 0x0000000000000000
>> kernel: MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 0
>> kernel: MCA: CPU 0 COR BUSLG Source RD Memory
>> kernel: MCA: Address 0x7ff6b0
>>
>> FreeBSD 7.3-STABLE #1: Sun Aug 22 23:16:43
>
> And another one:
>
> kernel: MCA: Bank 4, Status 0x9459c0014a080813
> kernel: MCA: Global Cap 0x0000000000000105, Status 0x0000000000000000
> kernel: MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 0
> kernel: MCA: CPU 0 COR BUSLG Source RD Memory
> kernel: MCA: Address 0x7ff670
I believe that you get correctable RAM ECC errors, but not entirely
sure.
There is mcelog utility that decodes such messages into human-friendly
descriptions.
The utility is available on Linux-based systems.
John Baldwin has a port of it to FreeBSD, but it seems to be WIP and is
private
so far. Wait and watch John posting decoded text in this thread :-)
It is not private, it is in //depot/projects/mcelog/... in p4. It is
not a
complete port yet though (doesn't support the daemon and client modes for
example).
Details for these errors:
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge
ADDR 7ff6b0
Northbridge RAM Chipkill ECC error
Chipkill ECC syndrome = fe18
bit32 = err cpu0
bit46 = corrected ecc error
bus error 'local node origin, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS 940c4001fe080813 MCGSTATUS 0
MCGCAP 105 APICID 0 SOCKETID 0
CPUID Vendor AMD Family 15 Model 5
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge
ADDR 7ff670
Northbridge RAM Chipkill ECC error
Chipkill ECC syndrome = 4ab3
bit32 = err cpu0
bit46 = corrected ecc error
bus error 'local node origin, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS 9459c0014a080813 MCGSTATUS 0
MCGCAP 105 APICID 0 SOCKETID 0
CPUID Vendor AMD Family 15 Model 5
As Andriy guessed, I believe both of these are corrected ECC errors. You
can likely ignore them as a low rate of corrected ECC errors is not
unexpected.
Hi,
A little off topic, but what is 'a low rate of corrected ECC errors'? At
work one machine has them like ones per day, but runs ok. Is ones per day
much?
Ronald.
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"