Re: [gentoo-user] Re: Plasma session saving

2023-07-06 Thread Peter Humphrey
On Wednesday, 5 July 2023 22:46:33 BST mad.scientist.at.la...@tutanota.com 
wrote:

...>8

> Jul 5, 2023, 11:50 by grant.b.edwa...@gmail.com:

I don't have that posting by Grant here.

> > On 2023-07-05, Peter Humphrey  wrote:
> >> This version of memtest86 ran to completion after going through the whole
> >> 64GB, and stopped with a success message.
> > 
> > That's a pretty good sign, but I have seen memory that made it through
> > one complete test pass and failed on subsequent ones.

So I ran another test last night, with the same result.

In fact, I can't remember ever having a memory problem exposed by a memory 
test.
 
> >> Over the last...oh, many months, I've noticed an occasional package in a
> >> large batch failing for no obvious reason, only to succeed on its own.
> > 
> > What sort of failure?  I've found that inconsistent/random gcc
> > internal errors or gcc segfaults have usually been due to failing
> > RAM. [Though in one case I remember, it was due to a failing SCSI disc
> > controller card -- back when that was a thing.]

Various minor things, such as some component not being found.

> > It might also be due to a failing disk, but there are usually good
> > indications of that in dmesg output and in SMART logs before it starts
> > to affect other things.

Hm. I'll have a look around the various tools.

Thanks for the ideas.

-- 
Regards,
Peter.






Re: [gentoo-user] Re: Plasma session saving

2023-07-05 Thread mad . scientist . at . large
It could also indicate a problem with the power supply failing.  I've seen this 
a number of times and it often manifest as memory errors when testing the ram.  
 

Any number of things in the computer can fail in ways that may not be so 
obvious.  Substitution trouble shooting may be needed, i.e. try a known good 
power supply with known good memory, or take half the ram out to see if the 
problem persist, then check the other half of the ram.

It'd also a good also worth pulling and reseating the ram and any cards in it.  
I've got a big huger server that was having issues, it has a removable drawer 
for the cpu/memory, I pulled it out about 1/2 inch and reseated it and the 
errors stopped.  That was a couple of months ago.  Also probably a good idea to 
reseat the cpu as well.  Finally, you should also check the fans/dustiness of 
the computer in question, both of which can produce higher temps and random 
behavior.

And yes, it's a pain to properly test large amounts of ram, especially if you 
don't have a backup machine to work on while the other is testing.


--"Fascism begins the moment a ruling class, fearing the people may use their 
political democracy to gain economic democracy, begins to destroy political 
democracy in order to retain its power of exploitation and special privilege." 
Tommy Douglas




Jul 5, 2023, 11:50 by grant.b.edwa...@gmail.com:

> On 2023-07-05, Peter Humphrey  wrote:
>
>> This version of memtest86 ran to completion after going through the whole 
>> 64GB, and stopped with a success message.
>>
>
> That's a pretty good sign, but I have seen memory that made it through
> one complete test pass and failed on subsequent ones.
>
>> Over the last...oh, many months, I've noticed an occasional package in a 
>> large 
>> batch failing for no obvious reason, only to succeed on its own.
>>
>
> What sort of failure?  I've found that inconsistent/random gcc
> internal errors or gcc segfaults have usually been due to failing
> RAM. [Though in one case I remember, it was due to a failing SCSI disc
> controller card -- back when that was a thing.]
>
> It might also be due to a failing disk, but there are usually good
> indications of that in dmesg output and in SMART logs before it starts
> to affect other things.
>
> --
> Grant
>




[gentoo-user] Re: Plasma session saving

2023-07-05 Thread Grant Edwards
On 2023-07-05, Peter Humphrey  wrote:

> This version of memtest86 ran to completion after going through the whole 
> 64GB, and stopped with a success message.

That's a pretty good sign, but I have seen memory that made it through
one complete test pass and failed on subsequent ones.

> Over the last...oh, many months, I've noticed an occasional package in a 
> large 
> batch failing for no obvious reason, only to succeed on its own.

What sort of failure?  I've found that inconsistent/random gcc
internal errors or gcc segfaults have usually been due to failing
RAM. [Though in one case I remember, it was due to a failing SCSI disc
controller card -- back when that was a thing.]

It might also be due to a failing disk, but there are usually good
indications of that in dmesg output and in SMART logs before it starts
to affect other things.

--
Grant