[Bug target/53087] [powerpc] Poor code from cstore expander
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53087 Steven Bosscher steven at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED --- Comment #13 from Steven Bosscher steven at gcc dot gnu.org 2012-11-08 23:01:54 UTC --- Fixed according to comment #12.
[Bug target/53087] [powerpc] Poor code from cstore expander
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53087 --- Comment #11 from Segher Boessenkool segher at gcc dot gnu.org 2012-09-26 05:18:49 UTC --- Author: segher Date: Wed Sep 26 05:18:43 2012 New Revision: 191752 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=191752 Log: gcc/ PR target/51274 PR target/53087 * config/rs6000/rs6000.md (ne0si): Remove unnecessary earlyclobber. Merge with... (ne0di): ... to... (ne0_mode): New. (plus_ne0si): Merge with... (plus_ne0di): ... to... (plus_ne0_mode): New. (compare_plus_ne0si): Merge with... (compare_plus_ne0di)... to... (compare_plus_ne0_mode): New. (compare_plus_ne0_mode_1): New. (plus_ne0si_compare): Merge with... (plus_ne0di_compare)... to... (plus_ne0_mode_compare): New. gcc/testsuite/ PR target/51274 PR target/53087 * gcc.target/powerpc/ppc-ne0-1.c: New. Added: trunk/gcc/testsuite/gcc.target/powerpc/ppc-ne0-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/rs6000.md trunk/gcc/testsuite/ChangeLog
[Bug target/53087] [powerpc] Poor code from cstore expander
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53087 Segher Boessenkool segher at gcc dot gnu.org changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #12 from Segher Boessenkool segher at gcc dot gnu.org 2012-09-26 05:42:26 UTC --- With the abs patterns gone, and the ne0 patterns fixed, mainline now generates li 10,1 lis 9,0xcf8 ori 9,9,63 sld 3,10,3 and 3,3,9 addic 9,3,-1 subfe 3,9,3 blr which is the expected code. It would be nice if we could get the 4.1 code back, but that's a separate problem (and not a target problem I think).
[Bug target/53087] [powerpc] Poor code from cstore expander
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53087 Paolo Bonzini bonzini at gnu dot org changed: What|Removed |Added Status|NEW |ASSIGNED AssignedTo|unassigned at gcc dot |bonzini at gnu dot org |gnu.org | --- Comment #10 from Paolo Bonzini bonzini at gnu dot org 2012-04-26 15:52:32 UTC --- Created attachment 27248 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27248 tentative patch Any change in expmed.c is not enough to avoid the cr-setting instruction and., because we get that instruction by design: For SNE, we would prefer that the xor/abs sequence be used for integers. For SEQ, likewise, except that comparisons with zero should be done with an scc insns. However, due to the order that combine see the resulting insns, we must, in fact, allow SEQ for integers. The question is whether it is really true that For SEQ, likewise, except that comparisons with zero should be done with an scc insns. In general, I think no. I tried disabling scc altogether on this testcase: int eq1_phi (unsigned long a) { return a ? 0 : 1; } and Steven's smaller testcase. They compile respectively to: eq1_phi: sradi 0,3,63 xor 3,0,3 subf 3,0,3 addi 3,3,-1 srdi 3,3,63 blr foo: li 9,1 lis 0,0xcf8 sld 3,9,3 ori 0,0,63 and 0,3,0 neg 3,0 srdi 3,3,63 blr In both cases the code is produced by the abs-based trick in emit_store_flag. The trick produces bad code in the first case, because combine finds nothing to simplify---unlike Steven's case, where abs effectively goes away completely. Since many current PPC processors lack the abs/nabs instructions and do implement clz/ctz, I tried teaching expmed.c the cntlz trick of comment 5. This improves eq1_phi noticeably and produces one instruction more than above (as expected, this is roughly the same code as XLC): eq1_phi: cntlzd 3,3 srdi 3,3,6 blr foo: li 9,1 lis 0,0xcf8 sld 3,9,3 ori 0,0,63 and 3,3,0 cntlzd 3,3 addi 3,3,-64 srdi 3,3,63 blr Overall, I think it is preferable to get the slight pessimization of foo and get better code when combine cannot optimize away the ABS. It would anyway fix the bug, and the slight pessimization can still be offset in other ways, as suggested below. Hence I'm attaching a patch that teaches expmed to use cntlz, removes absmode_nopower patterns, and makes cstoresi4 do the same thing for EQ and NE. I will add testcases and bootstrap/regtest on x86 before submitting, but of course in the meanwhile it would help a lot if somebody can bootstrap/regtest it on ppc and/or ppc64. On top of this, one thing to look at is making switch expansion better. For example one way to solve this bug in a target-independent way could be to expand switch statements to jump if (mask a) 0 (with a bit-reversed mask). 0 and =0 are a bit easier to convert to sCC than ==0 and !=0. On SPARC the bit-reversed mask can even be cheaper to set up (a single mov covers 14-bit immediates, a single sethi covers 22-bit immediates as long as the lowest 10 are zero); on PowerPC and ARM it should be roughly the same. I don't know about Thumb. Another thing to do is to use a smaller type if possible, in this case int instead of long. This is because loading 64-bit constants can be expensive. This is especially true with the bit-reversed mask, but also in general; on PowerPC, loading 0xCF80003F in DImode is one instruction more than 0xCF80003F in SImode, because of a rldicl instruction needed to clear the high word. This two improvements give for the same input bits as comment 0: int bar (unsigned long a) { return (((int)0xFC001F30 a) 0); } which compiles to even better code: bar: lis 0,0xfc00 ori 0,0,7984 slw 3,0,3 srwi 3,3,31 blr or on x86: movl%edi, %ecx movl$-67100880, %eax sall%cl, %eax shrl$31, %eax ret Finally, switch expansion could also pick a different jump direction (!= vs == for the current code, vs. = with the above suggestion) to minimize the number of bits set in the mask. On PPC for example, 0x0CF8 is one instruction more expensive than 0xF307.
[Bug target/53087] [powerpc] Poor code from cstore expander
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53087 Alan Modra amodra at gmail dot com changed: What|Removed |Added CC||bonzini at gnu dot org --- Comment #8 from Alan Modra amodra at gmail dot com 2012-04-25 12:58:17 UTC --- The regression appeared with r149032 2009-06-28 Paolo Bonzini bonz...@gnu.org * expr.c (expand_expr_real_1): Just use do_store_flag. (do_store_flag): Drop support for TRUTH_NOT_EXPR. Use emit_store_flag_force. * expmed.c (emit_store_flag_force): Copy here trick previously in expand_expr_real_1. Try reversing the comparison. (emit_store_flag_1): Work if target is NULL. (emit_store_flag): Work if target is NULL, using the result mode from the comparison. Use split_comparison, restructure final part to simplify conditionals.
[Bug target/53087] [powerpc] Poor code from cstore expander
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53087 --- Comment #9 from Paolo Bonzini bonzini at gnu dot org 2012-04-25 20:00:57 UTC --- The handling of this code sequence in fold-const changed back and forth many times, and this is likely the reason why GCC 4.1 produced straight-line code while GCC 4.3 produced branchy code. I think the best fix is to add support for expanding x != 0 using the cntlzw trick---either in the cstore expander or in emit_store_flag. In fact, emit_store_flag already has code for a similar trick: /* For EQ or NE, one way to do the comparison is to apply an operation that converts the operand into a positive number if it is nonzero or zero if it was originally zero. Then, for EQ, we subtract 1 and for NE we negate. This puts the result in the sign bit. Then we normalize with a shift, if needed. Two operations that can do the above actions are ABS and FFS, so try them. If that doesn't work, and MODE is smaller than a full word, we can use zero-extension to the wider mode (an unsigned conversion) as the operation. */ So another thing to do is to investigate why this doesn't work.
[Bug target/53087] [powerpc] Poor code from cstore expander
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53087 --- Comment #7 from Alan Modra amodra at gmail dot com 2012-04-25 05:26:28 UTC --- Some more data points. The testcase in #1 produces gcc-4.3.6 cmpldi 7,3,27 mr 9,3 li 3,0 bgtlr 7 lis 0,0xcf8 ori 0,0,63 srd 0,0,9 rldicl 3,0,0,63 blr gcc-4.4.7 cmpldi 7,3,27 li 0,0 bgt 7,.L3 lis 0,0xcf8 ori 0,0,63 srd 0,0,3 rldicl 0,0,0,63 .L3: mr 3,0 blr gcc-4.5.0 cmpldi 7,3,27 li 0,0 bgt 7,.L2 li 0,1 sld 3,0,3 lis 0,0xcf8 ori 0,0,63 and. 9,3,0 mfcr 0 rlwinm 0,0,3,1 xori 0,0,1 extsw 0,0 .L2: mr 3,0 blr
[Bug target/53087] [powerpc] Poor code from cstore expander
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53087 Steven Bosscher steven at gcc dot gnu.org changed: What|Removed |Added Summary|Poor code for conversion|[powerpc] Poor code from |from _Bool to int |cstore expander --- Comment #3 from Steven Bosscher steven at gcc dot gnu.org 2012-04-23 20:12:37 UTC --- The first time round to rs6000.md:cstoremode4, the insn isn't generated because code==NE. The second time, some insns are emitted: Breakpoint 11, emit_store_flag_1 (target=0x0, code=EQ, op0=0xfffb5f092c0, op1=0xfffb5f60470, mode=DImode, unsignedp=0, normalizep=1, target_mode=QImode) at ../../trunk/gcc/expmed.c:5363 5363 do_pending_stack_adjust (); (gdb) next 5364 tem = emit_cstore (target, icode, code, mode, compare_mode, (gdb) 5366 if (tem) (gdb) p tem $67 = (rtx) 0xfffb5f094a0 (gdb) p debug_rtx(tem) (reg:QI 132) $68 = void (gdb) p debug_rtx_list (get_last_insn(), -7) (insn 15 14 16 (set (reg:CC 134) (compare:CC (reg:DI 123 [ D.2005 ]) (const_int 0 [0]))) t.c:16 -1 (nil)) (insn 16 15 17 (set (reg:DI 135) (eq:DI (reg:CC 134) (const_int 0 [0]))) t.c:16 -1 (nil)) (insn 17 16 18 (set (reg:SI 133) (subreg:SI (reg:DI 135) 4)) t.c:16 -1 (nil)) (insn 18 17 0 (set (reg:QI 132) (subreg:QI (reg:SI 133) 3)) t.c:16 -1 (nil)) $69 = void (gdb) next 5367return tem; (gdb) 5381} (gdb) emit_store_flag (target=0xfffb5f09460, code=NE, op0=0xfffb5f092c0, op1=0xfffb5f60470, mode=DImode, unsignedp=1, normalizep=1) at ../../trunk/gcc/expmed.c:5578 5578 if (tem != 0) (gdb) p tem $70 = (rtx) 0xfffb5f094a0 (gdb) l 5573rtx_cost (trueval, XOR, 1, 5574optimize_insn_for_speed_p ()) == 0) 5575{ 5576 tem = emit_store_flag_1 (subtarget, rcode, op0, op1, mode, 0, 5577 normalizep, target_mode); 5578 if (tem != 0) 5579tem = expand_binop (target_mode, xor_optab, tem, trueval, target, 5580INTVAL (trueval) = 0, OPTAB_WIDEN); 5581} So the problem is not the _Bool-int conversion but the cstore for ;; D.2013_7 = D.2005_5 != 0;
[Bug target/53087] [powerpc] Poor code from cstore expander
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53087 --- Comment #4 from Steven Bosscher steven at gcc dot gnu.org 2012-04-23 20:13:29 UTC --- Smaller test case: _Bool foo (long unsigned int a) { return (((1L a) 217579583UL) != 0); } == .file t.c .section.toc,aw .section.text .align 2 .p2align 4,,15 .globl foo .section.opd,aw .align 3 foo: .quad .L.foo,.TOC.@tocbase,0 .previous .type foo, @function .L.foo: li 10,1 lis 9,0xcf8 sld 3,10,3 ori 9,9,63 and. 10,3,9 mfcr 3 rlwinm 3,3,3,1 xori 3,3,1 blr .long 0 .byte 0,0,0,0,0,0,0,0 .size foo,.-.L.foo .ident GCC: (GNU) 4.8.0 20120418 (experimental) [trunk revision 186580]
[Bug target/53087] [powerpc] Poor code from cstore expander
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53087 David Edelsohn dje at gcc dot gnu.org changed: What|Removed |Added CC||dje at gcc dot gnu.org --- Comment #5 from David Edelsohn dje at gcc dot gnu.org 2012-04-23 20:23:42 UTC --- XLC produces: addi r0,r0,1 addis r4,r0,3320 slwr0,r0,r3 addi r3,r4,63 andr0,r0,r3 cntlzw r3,r0 addi r0,r3,-32 rlwinm r3,r0,1,31,31 blr
[Bug target/53087] [powerpc] Poor code from cstore expander
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53087 --- Comment #6 from David Edelsohn dje at gcc dot gnu.org 2012-04-24 00:40:28 UTC --- GCC 4.1 produced: lis 9,0xcf8 li 0,1 ori 9,9,63 slw 0,0,3 and 0,0,9 neg 0,0 srwi 3,0,31 blr The branch code is better than the code using mfcr, but the straight-line code from XLC or GCC avoiding CR is better than both.