Re: [gentoo-user] Re: Plasma session saving
On Wednesday, 5 July 2023 22:46:33 BST mad.scientist.at.la...@tutanota.com wrote: ...>8 > Jul 5, 2023, 11:50 by grant.b.edwa...@gmail.com: I don't have that posting by Grant here. > > On 2023-07-05, Peter Humphrey wrote: > >> This version of memtest86 ran to completion after going through the whole > >> 64GB, and stopped with a success message. > > > > That's a pretty good sign, but I have seen memory that made it through > > one complete test pass and failed on subsequent ones. So I ran another test last night, with the same result. In fact, I can't remember ever having a memory problem exposed by a memory test. > >> Over the last...oh, many months, I've noticed an occasional package in a > >> large batch failing for no obvious reason, only to succeed on its own. > > > > What sort of failure? I've found that inconsistent/random gcc > > internal errors or gcc segfaults have usually been due to failing > > RAM. [Though in one case I remember, it was due to a failing SCSI disc > > controller card -- back when that was a thing.] Various minor things, such as some component not being found. > > It might also be due to a failing disk, but there are usually good > > indications of that in dmesg output and in SMART logs before it starts > > to affect other things. Hm. I'll have a look around the various tools. Thanks for the ideas. -- Regards, Peter.
Re: [gentoo-user] Re: Plasma session saving
It could also indicate a problem with the power supply failing. I've seen this a number of times and it often manifest as memory errors when testing the ram. Any number of things in the computer can fail in ways that may not be so obvious. Substitution trouble shooting may be needed, i.e. try a known good power supply with known good memory, or take half the ram out to see if the problem persist, then check the other half of the ram. It'd also a good also worth pulling and reseating the ram and any cards in it. I've got a big huger server that was having issues, it has a removable drawer for the cpu/memory, I pulled it out about 1/2 inch and reseated it and the errors stopped. That was a couple of months ago. Also probably a good idea to reseat the cpu as well. Finally, you should also check the fans/dustiness of the computer in question, both of which can produce higher temps and random behavior. And yes, it's a pain to properly test large amounts of ram, especially if you don't have a backup machine to work on while the other is testing. --"Fascism begins the moment a ruling class, fearing the people may use their political democracy to gain economic democracy, begins to destroy political democracy in order to retain its power of exploitation and special privilege." Tommy Douglas Jul 5, 2023, 11:50 by grant.b.edwa...@gmail.com: > On 2023-07-05, Peter Humphrey wrote: > >> This version of memtest86 ran to completion after going through the whole >> 64GB, and stopped with a success message. >> > > That's a pretty good sign, but I have seen memory that made it through > one complete test pass and failed on subsequent ones. > >> Over the last...oh, many months, I've noticed an occasional package in a >> large >> batch failing for no obvious reason, only to succeed on its own. >> > > What sort of failure? I've found that inconsistent/random gcc > internal errors or gcc segfaults have usually been due to failing > RAM. [Though in one case I remember, it was due to a failing SCSI disc > controller card -- back when that was a thing.] > > It might also be due to a failing disk, but there are usually good > indications of that in dmesg output and in SMART logs before it starts > to affect other things. > > -- > Grant >
[gentoo-user] Re: Plasma session saving
On 2023-07-05, Peter Humphrey wrote: > This version of memtest86 ran to completion after going through the whole > 64GB, and stopped with a success message. That's a pretty good sign, but I have seen memory that made it through one complete test pass and failed on subsequent ones. > Over the last...oh, many months, I've noticed an occasional package in a > large > batch failing for no obvious reason, only to succeed on its own. What sort of failure? I've found that inconsistent/random gcc internal errors or gcc segfaults have usually been due to failing RAM. [Though in one case I remember, it was due to a failing SCSI disc controller card -- back when that was a thing.] It might also be due to a failing disk, but there are usually good indications of that in dmesg output and in SMART logs before it starts to affect other things. -- Grant