Helmut Jarausch schrieb:
> On 24 Nov, Stefan G. Weichinger wrote:
>> Stefan G. Weichinger schrieb:
>>> Stefan G. Weichinger schrieb:
>>>
>>>> Since then no crashes, but I would have to test clicking some more stuff
>>>> to really believe ...
>>> As always, after hitting SEND ... one more crash ...
>> Sometimes it crashes after clicking opera, sometimes after clicking
>> thunderbird, so far never when clicking/starting a gnome-terminal.
>>
>> I am still looking for a pattern or an error-message somewhere ...
>>
> 
> This reminds me of a problem we had just recently.
> Have you got a multi-core CPU ?
> If yes, read on.
> 
> We have 6 machines here running an identical Gentoo system
> (just different hostname and IP number)
> with a AMD Phenom II quad core CPU and identical mother boards.
> One of them had these random crashes you reported.
> I've totured memory by running up to 3 memtester-processes
> over night - no single fault. Our dealer has replaced the motherboard -
> again no change. Then I suspected the CPU itself although it has stood
> a burnK7 run for several hours.
> 
> After the CPU has been replaced the spook has gone.
> I suspect a cache coherence problem. The normal memory tests
> assign a given window of the physical storage to a given core -
> even if run in parallel. But a typical usage under Linux switches 
> the core which executes a given thread quite frequently.
> Now the Phenom II has 4 core each with a private 0.5 Mb primary cache
> but a 6 Mb second level cache common to all 4 cores.
> In the BIOS one can opt for all 4 cores using this secondary cache
> or for only a single core using it.
> When a core writes to this cache or to memory all other cores must be
> informed that their private cache is invalid. If this doesn't happen or
> happens a bit too late, a core will fetch invalid (old) memory contents
> which may result in a crash.
> So, if you can, set the BIOS switch that only a single core
> can use the secondary cache. If the problems disappears
> the CPU is broken.

Phew, quite some theory ... do you positively know that this was the reason?

I think I haven't seen such a setting in my BIOS.

I use an Intel Core2Duo E6600 on a Intel DP965LT board here, 8 gigs of
RAM lately ...

BUT my issues really only started after completely going to ~amd64, I
never saw such a crash before when I used a mixed setup (most pkgs
stable, some unstable ...)

I will have a look at my BIOS now.

Thanks anyway for that information, greets to Aachen (from Austria) ...

Stefan

Reply via email to