http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48610
Summary: [4.3 Regression]: loop miscompilation; load removed by -funroll-loops Product: gcc Version: 4.3.6 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: h...@gcc.gnu.org Target: sparc64-*-* Created attachment 23984 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23984 Repeat with gcc -m32 -funroll-loops -O2 -save-temps -mcpu=ultrasparc -mvis -c and link together with the driver in the other attachment This is a bug present in at least the gcc-4.3 series; I've checked gcc-4.3.5 and the gcc-core-4.3-20110410.tar.bz2 snapshot. It is not present in gcc-4.4.0, and not in SVN r170836 from trunk (later the 4.6 branch). I've triaged the bug to *disappear* with the SVN commit r139263 (at 2008-08-20) to trunk (later the 4.4-branch), but the commit or ChangeLog entry does not mention any bug (any wrong-code) being fixed and was in a swarm of improvement commits. I've been unable to locate the corresponding message to gcc-patches@. The change is related to loops, but I can't tell whether this is a correction or just an unrelated change hiding the bug, so entering this report as a note may be helpful, seeing as gcc-4.3 was the installed version at time of this writing. I have *not* analysed the gcc execution with/without this revision. It seems likely this bug will trig for other ports, and for code that is not even vector-related. It is present in "gcc version 4.3.2 (Debian 4.3.2-1.1)" (plain gcc on gcc54) but not in "gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)" (gcc-4.1 on gcc54), so I think it's correct to call it a regression. As seen in the attachment note, the bug requires -m32 -funroll-loops to appear. The "-mcpu=ultrasparc -mvis" are also necessary for the builtins and vector code to be valid. Again, the bug does not appear if the 64-bit ABI is used. Beware that the installed gcc on gcc54 uses the 32-bit ABI by default, but when you compile your own, you get the 64-bit ABI by default. The difference is cancelled by the explicit -m32 option. In the attached code, the test-program is spread out over two files. I don't think that's necessary, but on the other hand fitting VIS code for use in the gcc test-suite doesn't seem like a good idea, seeing as it plain doesn't execute correctly (wrong result) on Ultrasparc T1 (gcc-63), which according to a STFW should emulate the instructions in the kernel, "2.6.26-2-sparc64-smp (Debian 2.6.26-21lenny4)". In the test-case, the manually unrolled code (the three expanded X macro invocations) isn't executed, only the "x < rem" loop, and the wrong result comes from the second (and last) iteration. (There were originally four macro invocations, but the first has been reduced to the crumb at the beginning of the loop.) The seemingly-pointless asms with identical input and output and faking use of other variables are an attempt to force dependencies to the GSR register. GSR is a (user-accessible) control register used by two of the builtins (of which only one, __builtin_vis_faligndatav8qi, remains in the code; the other insn is now expressed through an asm, fpack16). Unfortunately, the SPARC VIS vector port has no support for expressing this dependence, arguably a (separate) bug or at least incompleteness in the implementation. This matter is only coincidental to this bug, AFAICT.