Re: [gentoo-user] SegFault while compiling gcc 4.1.1
On Thursday 30 November 2006 07:56, Vladimir G. Ivanovic wrote: Let's take a poll. 1. Have you seen this error message in an emerge? Yes, several times. 2. Have you subsequently identified a hardware problem, fixed the hardware problem, and have not seen the message since? Yes. 99% of the times it was bad RAM (verified with memtest86). Of course, for trivial emerges a subsequent emerge completed fine, but the first failure put me on the alert. 3. Have you re-run the emerge and not seen the message in a while (please indicate how long a while is.) For me, a while is since fixing the hardware problem. BTW, do you know portage/emerge/make/whatever knows that the problem is not reproducible? If, all other things being equal, a subsequent attempt at the same operation does not exhibit the problem, or fails differently, there's a good chance that the problem is not reproducible. -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] SegFault while compiling gcc 4.1.1
On 11/29/06, Vladimir G. Ivanovic [EMAIL PROTECTED] wrote: 1. Have you seen this error message in an emerge? Yes. 2. Have you subsequently identified a hardware problem, fixed the hardware problem, and have not seen the message since? Yes. The problem was memory timings...or more specifically the RAM didn't really work as fast as its manufacturer claimed. Dropping the memory timings, and later replacing the RAM, fixed the problem. 3. Have you re-run the emerge and not seen the message in a while (please indicate how long a while is.) No. -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] SegFault while compiling gcc 4.1.1
On Thursday 30 November 2006 09:02, Etaoin Shrdlu wrote: On Thursday 30 November 2006 07:56, Vladimir G. Ivanovic wrote: Let's take a poll. 1. Have you seen this error message in an emerge? Yes, several times. Ditto. 2. Have you subsequently identified a hardware problem, fixed the hardware problem, and have not seen the message since? Yes. 99% of the times it was bad RAM (verified with memtest86). Of course, for trivial emerges a subsequent emerge completed fine, but the first failure put me on the alert. Yes, it was a bad/incompatible RAM module for the particular memory controller. memetest86 did not identify it, I found out by trial and error! The symptom was random crashes (mostly) during emerge which would complete fine after a hard reboot. The crashes would invariably happen either when the memory controller was switching onto the next memory module, or when both modules were used up and it started using swap. 3. Have you re-run the emerge and not seen the message in a while (please indicate how long a while is.) For me, a while is since fixing the hardware problem. Ditto, i.e. about 18 months so far. HTH. -- Regards, Mick pgpy2Wqda1NFT.pgp Description: PGP signature
Re: [gentoo-user] SegFault while compiling gcc 4.1.1
Etaoin Shrdlu wrote: On Thursday 30 November 2006 07:56, Vladimir G. Ivanovic wrote: Let's take a poll. 1. Have you seen this error message in an emerge? Yes, several times. 2. Have you subsequently identified a hardware problem, fixed the hardware problem, and have not seen the message since? Yes. 99% of the times it was bad RAM (verified with memtest86). Of course, for trivial emerges a subsequent emerge completed fine, but the first failure put me on the alert. 3. Have you re-run the emerge and not seen the message in a while (please indicate how long a while is.) For me, a while is since fixing the hardware problem. BTW, do you know portage/emerge/make/whatever knows that the problem is not reproducible? If, all other things being equal, a subsequent attempt at the same operation does not exhibit the problem, or fails differently, there's a good chance that the problem is not reproducible. Interesting responses from 3 people. But ... I have done nothing to my hardware and I've seen this error, oh, a half a dozen times, the last time 3 months (?) ago. I ran memtest when I installed new memory, and it did not report problems even when run for hours. And I do not get random segfaults with other programs. Finally, I don't think my hardware fixed itself. Given all of this, my suspicion is that these errors are software bugs, not hardware problems. The other thing that I don't really believe is the part about this bug not being reproducible as reported by portage/emerge/make/gcc. I don't recall any evidence that the emerge that actually tried the compilation again and /succeeded/. (Why then error out rather than print a warning message like, Compilation retry succeeded on subsequent attempt; hardware problem suspected.) So, my suspicion that the commentary is bogus; but I believe the part about internal compiler error: Segfault. --- Vladimir -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] SegFault while compiling gcc 4.1.1
On 11/30/06, Vladimir G. Ivanovic [EMAIL PROTECTED] wrote: I have done nothing to my hardware and I've seen this error, oh, a half a dozen times, the last time 3 months (?) ago. I ran memtest when I installed new memory, and it did not report problems even when run for hours. memtest is basically useless these days. It can only tell you if you have a bad memory cell, which almost never happens today. Most memory problems are the result of timing issues between the processor(s) and DMA controllers. This script [1] seems to be a much better memory test for modern systems, although you may have to make some tweaks to run it on Gentoo. And I do not get random segfaults with other programs. Yes, compiling is very unique in this regard. The memory access pattern of a compiler, reading and writing to locations on different rows, or even different modules, under high CPU load and using lots of memory, with some IO thrown in for good measure, tends to reveal hardware problems quite nicely. Finally, I don't think my hardware fixed itself. Given all of this, my suspicion is that these errors are software bugs, not hardware problems. If we were talking about a driver, or an event-based GUI program, I might agree. But a compiler is going to take the exact same actions given the same input and options. The compiler isn't going to do something different between 2 different executions over the _exact_ same sources because it feels like it. The other thing that I don't really believe is the part about this bug not being reproducible as reported by portage/emerge/make/gcc. Then you should read the gcc sources. One of the patches applied by Gentoo adds a retry loop when the compiler is about to exit with an internal compiler error (ICE). It retries the compile twice, and if either of those succeeds, you get the The bug is not reproducible message. It doesn't output anything because that would possibly obscure the original error. The gentoo devs probably added this loop to avoid more duplicates of [2]. -Richard [1] http://people.redhat.com/dledford/memtest.html [2] http://bugs.gentoo.org/show_bug.cgi?id=20600 -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] SegFault while compiling gcc 4.1.1
Richard Fish wrote: On 11/30/06, Vladimir G. Ivanovic [EMAIL PROTECTED] wrote: I have done nothing to my hardware and I've seen this error, oh, a half a dozen times, the last time 3 months (?) ago. I ran memtest when I installed new memory, and it did not report problems even when run for hours. memtest is basically useless these days. It can only tell you if you have a bad memory cell, which almost never happens today. Most memory problems are the result of timing issues between the processor(s) and DMA controllers. This script [1] seems to be a much better memory test for modern systems, although you may have to make some tweaks to run it on Gentoo. Just for kicks I'll run the script and see what happens. And I do not get random segfaults with other programs. Yes, compiling is very unique in this regard. The memory access pattern of a compiler, reading and writing to locations on different rows, or even different modules, under high CPU load and using lots of memory, with some IO thrown in for good measure, tends to reveal hardware problems quite nicely. Finally, I don't think my hardware fixed itself. Given all of this, my suspicion is that these errors are software bugs, not hardware problems. For grins, here is part of comment #174: Random segfaults during compilation. ... in general a sign of hardware problems. // No, this is in general a sign of GCC 4.1 - problem ;-) If we were talking about a driver, or an event-based GUI program, I might agree. But a compiler is going to take the exact same actions given the same input and options. The compiler isn't going to do something different between 2 different executions over the _exact_ same sources because it feels like it. You're right at the logical level, but not at the physical level. Cache effects and different disk accesses are two physical differences that spring to mind. Temporary files will be in different physical sectors, or in the buffer cache or not; directories may or may not be in the directory cache. Depending on what else is running, the pattern of cache misses will be different. I emerge with -j2. Plus I'm doing work while the emerges happen. The likelihood of the memory access pattern of two compiles being the same is precisely zero. The other thing that I don't really believe is the part about this bug not being reproducible as reported by portage/emerge/make/gcc. Then you should read the gcc sources. One of the patches applied by Gentoo adds a retry loop when the compiler is about to exit with an internal compiler error (ICE). It retries the compile twice, and if either of those succeeds, you get the The bug is not reproducible message. Interesting. I did not know that. But I don't get why gcc exits with an error when the second (or third) try succeeds? Why not just print a warning, perhaps at the end so it is noticeable? Most people will restart the entire emerge, which seems like a gargantuan amount of wasted effort since the re-compilation has succeeded. It doesn't output anything because that would possibly obscure the original error. The gentoo devs probably added this loop to avoid more duplicates of [2]. -Richard [1] http://people.redhat.com/dledford/memtest.html [2] http://bugs.gentoo.org/show_bug.cgi?id=20600 -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] SegFault while compiling gcc 4.1.1
Etaoin Shrdlu wrote: On Thursday 23 November 2006 14:39, Leandro Melo de Sales wrote: The bug is not reproducible, so it is likely a hardware or OS problem. ^ As the message says, you might have a hardware problem (usually bad RAM or CPU). My experience has been that it NEVER is a hardware problem. The next emerge of the same package always completes successfully. --- Vladimir -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] SegFault while compiling gcc 4.1.1
On Thursday, 30 November 2006 16:16, Vladimir G. Ivanovic wrote: Etaoin Shrdlu wrote: On Thursday 23 November 2006 14:39, Leandro Melo de Sales wrote: The bug is not reproducible, so it is likely a hardware or OS problem. ^ As the message says, you might have a hardware problem (usually bad RAM or CPU). My experience has been that it NEVER is a hardware problem. The next emerge of the same package always completes successfully. That behaviour usually indicates a hardware problem. Random unexplainable segfaults that you can't reproduce. -- Raymond Lewis Rebbeck -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] SegFault while compiling gcc 4.1.1
Raymond Lewis Rebbeck wrote: On Thursday, 30 November 2006 16:16, Vladimir G. Ivanovic wrote: Etaoin Shrdlu wrote: On Thursday 23 November 2006 14:39, Leandro Melo de Sales wrote: The bug is not reproducible, so it is likely a hardware or OS problem. ^ As the message says, you might have a hardware problem (usually bad RAM or CPU). My experience has been that it NEVER is a hardware problem. The next emerge of the same package always completes successfully. That behaviour usually indicates a hardware problem. Random unexplainable segfaults that you can't reproduce. Let's take a poll. 1. Have you seen this error message in an emerge? 2. Have you subsequently identified a hardware problem, fixed the hardware problem, and have not seen the message since? 3. Have you re-run the emerge and not seen the message in a while (please indicate how long a while is.) For me, the answers are: 1. Yes 2. No 3. Yes (~months) BTW, do you know portage/emerge/make/whatever knows that the problem is not reproducible? --- Vladimir -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] SegFault while compiling gcc 4.1.1
On Thursday 23 November 2006 14:39, Leandro Melo de Sales wrote: Hi, I'm trying to compile GCC 4.1.1-r1 but I got the following error message: [cut] 'main': /var/tmp/portage/gcc-4.1.1-r1/work/gcc-4.1.1/gcc/gcc.c:8043: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See URL:http://bugs.gentoo.org/ for instructions. The bug is not reproducible, so it is likely a hardware or OS problem. ^ As the message says, you might have a hardware problem (usually bad RAM or CPU). With this kind of errors, before submitting any bug reports, try reemerging gcc: if it works, or breaks again but in another point with the same error, it's very likely that the problem lies in your hardware. There is a good article by Daniel Robbins which explains how to troubleshoot such errors. http://www.gentoo.org/doc/en/articles/hardware-stability-p1.xml -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] SegFault while compiling gcc 4.1.1
Thank you, I'll read it. []s Leandro 2006/11/23, Etaoin Shrdlu [EMAIL PROTECTED]: On Thursday 23 November 2006 14:39, Leandro Melo de Sales wrote: Hi, I'm trying to compile GCC 4.1.1-r1 but I got the following error message: [cut] 'main': /var/tmp/portage/gcc-4.1.1-r1/work/gcc-4.1.1/gcc/gcc.c:8043: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See URL:http://bugs.gentoo.org/ for instructions. The bug is not reproducible, so it is likely a hardware or OS problem. ^ As the message says, you might have a hardware problem (usually bad RAM or CPU). With this kind of errors, before submitting any bug reports, try reemerging gcc: if it works, or breaks again but in another point with the same error, it's very likely that the problem lies in your hardware. There is a good article by Daniel Robbins which explains how to troubleshoot such errors. http://www.gentoo.org/doc/en/articles/hardware-stability-p1.xml -- gentoo-user@gentoo.org mailing list -- gentoo-user@gentoo.org mailing list