On Sat, Oct 20, 2012 at 4:38 AM, Julian Brown <jul...@codesourcery.com> wrote: > Hi, > > Quite a few tests fail for big-endian multilibs which use VFP > instructions at present. One reason for many of these is glaringly > obvious once you notice it: for D registers interpreted as two S > registers, the lower-numbered register is always the less-significant > part of the value, and the higher-numbered register the > more-significant -- regardless of the endianness the processor is > running in. > > However, for big-endian mode, when DFmode values are represented in > memory (or indeed core registers), the opposite is true. So, a subreg > expression such as the following will work fine on core registers (or > e.g. pseudos assigned to stack slots): > > (subreg:SI (reg:DF) 0) > > but, when applied to a VFP register Dn, it should be resolved to the > hard register S(n*2+1). At present though, it resolves to S(n*2) -- i.e. > the wrong half of the value (for WORDS_BIG_ENDIAN, such a subreg should > be the most-significant part of the value). For the relatively few cases > where DFmode values are interpreted as a pair of (integer) words, this > means that wrong code is generated. > > My feeling is that implementing a "proper" solution to this problem is > probably impractical -- the closest existing macros to control > behaviour aren't sufficient for this case: > > * FLOAT_WORDS_BIG_ENDIAN only refers to memory layout, which is correct > as is it. > > * REG_WORDS_BIG_ENDIAN controls whether values are stored in big-endian > order in registers, but refers to *all* registers. We only want to > change the behaviour for the VFP registers. Defining a new macro > FLOAT_REG_WORDS_BIG_ENDIAN wouldn't do, because the behaviour would > differ depending on the hard register under observation: that seems > like too much to ask of generic machinery in the middle-end. > > So, the attached patch just avoids the problem, by pretending that > greater-than-word-size values in VFP registers, in big-endian mode, are > opaque and cannot be subreg'ed. In practice, for at least the test case > I looked at, this isn't as much of a pessimisation as you might expect > -- the value in question might already be stored in core registers > (e.g. for function arguments with -mfloat-abi=softfp), so can be > retrieved directly from those rather than via memory. > > This is the testsuite delta for current FSF mainline, with multilibs > adjusted to build for little/big-endian, and using options > "-mbig-endian -mfloat-abi=softfp -mfpu=vfpv3" for testing: > > FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O1 > execution test > FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O2 > execution test > FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O2 > -flto -fno-use-linker-plugin -flto-partition=none execution test > FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O2 > -flto -fuse-linker-plugin -fno-fat-lto-objects execution test > FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O3 > -fomit-frame-pointer execution test > FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O3 -g > execution test > FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -Os > execution test > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/ieee/copysign1.c > execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/ieee/mzero6.c > execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr35456.c > execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c > execution, -O1 > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c > execution, -O2 > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c > execution, -O2 -flto -fno-use-linker-plugin -flto-partition=none > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c > execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c > execution, -O3 -fomit-frame-pointer > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c > execution, -O3 -g > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c > execution, -Og -g > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c > execution, -Os > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/compat/scalar-by-value-3 > c_compat_x_tst.o-c_compat_y_tst.o execute > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O1 > execution test > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O2 > execution test > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O2 > -flto -fno-use-linker-plugin -flto-partition=none execution test > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O2 > -flto -fuse-linker-plugin -fno-fat-lto-objects execution test > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O3 > -fomit-frame-pointer execution test > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O3 -g > execution test > FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -Os > execution test > > OK for mainline, or any comments? (I've included the multilib tweaks I > used in the attached patch for reference, though I'm not proposing to > apply those.)
I also tested this on GCC 4.7.0 with armeb-linux-gnueabi defaulting to hardfloat ABI and fixes a lot of failures there too. Thanks, Andrew Pinski > > Thanks, > > Julian > > ChangeLog > > gcc/ > * config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Avoid subreg'ing > VFP D registers in big-endian mode.