[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

rguenth at gcc dot gnu.org Thu, 06 Feb 2014 02:29:47 -0800

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59999


--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Paulo J. Matos from comment #10)
> (In reply to Paulo J. Matos from comment #8)
> > 
> > Made a mistake. With the attached test, the final gimple before expand for
> > the loop basic block is:
> > ;;   basic block 5, loop depth 0
> > ;;    pred:       5
> > ;;                4
> >   # i_26 = PHI <i_1(5), 0(4)>
> >   # ivtmp.24_18 = PHI <ivtmp.24_12(5), ivtmp.24_29(4)>
> >   _28 = (void *) ivtmp.24_18;
> >   _13 = MEM[base: _28, offset: 0B];
> >   x.4_14 = x;
> >   _15 = _13 ^ x.4_14;
> >   MEM[base: _28, offset: 0B] = _15;
> >   ivtmp.24_12 = ivtmp.24_18 + 4;
> >   temp_ptr.5_17 = (Sample *) ivtmp.24_12;
> >   _11 = (unsigned short) i_26;
> >   _2 = _11 + 1;
> >   i_1 = (short int) _2;
> >   _10 = (int) i_1;
> >   if (_10 < _25)
> >     goto <bb 5>;
> >   else
> >     goto <bb 6>;
> > ;;    succ:       5
> > ;;                6
> > 
> > However, the point is the same. IVOPTS should probably generate an int IV
> > instead of a short int IV to avoid the sign extend since removing the sign
> > extend during RTL seems to be quite hard.
> > 
> > What do you think?
> 
> For >= 4.8 the scalar evolution of _10 is deemed not simple, because it
> looks like the following:
>  <nop_expr 0x2aaaaacd9ee0
>     type <integer_type 0x2aaaaab16690 int public SI
>         size <integer_cst 0x2aaaaab12c60 constant 32>
>         unit size <integer_cst 0x2aaaaab12c80 constant 4>
>         align 32 symtab 0 alias set 3 canonical type 0x2aaaaab16690
> precision 32 min <integer_cst 0x2aaaaab12f80 -2147483648> max <integer_cst
> 0x2aaaaab12fa0 2147483647> context <translation_unit_decl 0x2aaaaab29c00
> D.2881>
>         pointer_to_this <pointer_type 0x2aaaaab23348>>
>    
>     arg 0 <polynomial_chrec 0x2aaaaacdb090
>         type <integer_type 0x2aaaaab16540 short int sizes-gimplified public
> HI
>             size <integer_cst 0x2aaaaab12f20 constant 16>
>             unit size <integer_cst 0x2aaaaab12f40 constant 2>
>             align 16 symtab 0 alias set 4 canonical type 0x2aaaaab16540
> precision 16 min <integer_cst 0x2aaaaab12ec0 -32768> max <integer_cst
> 0x2aaaaab12ee0 32767>
>             pointer_to_this <pointer_type 0x2aaaaaca1f18>>
>        
>         arg 0 <integer_cst 0x2aaaaab1f260 constant 1>
>         arg 1 <integer_cst 0x2aaaaacc9140 constant 1> arg 2 <integer_cst
> 0x2aaaaacc9140 1>>>
> 
> This is something like: (int) (short int) {1, +, 1}_1. Since these are
> signed integers, we can assume they don't overflow, can't we simplify the
> scalar evolution to a polynomial_chrec over 32bit integers and forget the
> nop_expr that represents the sign extend?

Note that {1, +, 1}_1 is unsigned.  The issue is that while i is short
i++ is really i = (short)((int) i + 1) and thus only the operation in
type 'int' is known to not overflow and thus the IV in short _can_
overflow and the loop can loop infinitely for example for loopCount
== SHORT_MAX + 1.

The fix to SCEV analysis was to still be able to analyze the evolution at all.

The testcase is simply very badly written (unsigned short upper bound,
signed short IV and IV comparison against upper bound in signed int).

[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

Reply via email to