https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106161

            Bug ID: 106161
           Summary: Dubious choice of optimization strategy
           Product: gcc
           Version: 9.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vluchits at gmail dot com
  Target Milestone: ---

Hello,

here's a piece of C code:

...
#define AC_NEWCEILING           16
#define AC_NEWFLOOR             32

...
        if (newclipbounds)
        {
            int newfloorclipx = floorclipx;
            int newceilingclipx = ceilingclipx;
            uint16_t newclip;

            // rewrite clipbounds
            if (actionbits & AC_NEWFLOOR)
                newfloorclipx = low;
            if (actionbits & AC_NEWCEILING)
                newceilingclipx = high;

            newclip = (newceilingclipx << 8) + newfloorclipx;
            clipbounds[x] = newclip;
            newclipbounds[x] = newclip;
        }
...

which is compiled with -Os and results in the following set of SH-2 assembler
instructions:

        if (newclipbounds)
 190:   54 fb           mov.l   @(44,r15),r4
 192:   24 48           tst     r4,r4
 194:   8d 11           bt.s    1ba <_R_SegLoop+0x1ba>
 196:   e0 58           mov     #88,r0
            if (actionbits & AC_NEWFLOOR)
 198:   05 fe           mov.l   @(r0,r15),r5
 19a:   25 58           tst     r5,r5
 19c:   8f 01           bf.s    1a2 <_R_SegLoop+0x1a2>
 19e:   e0 5c           mov     #92,r0
        floorclipx = ceilingclipx & 0x00ff;
 1a0:   67 93           mov     r9,r7
            if (actionbits & AC_NEWCEILING)
 1a2:   00 fe           mov.l   @(r0,r15),r0
 1a4:   20 08           tst     r0,r0
 1a6:   8f 01           bf.s    1ac <_R_SegLoop+0x1ac>
 1a8:   e0 40           mov     #64,r0
            int newceilingclipx = ceilingclipx;
 1aa:   66 83           mov     r8,r6
            clipbounds[x] = newclip;
 1ac:   00 fe           mov.l   @(r0,r15),r0
            newclip = (newceilingclipx << 8) + newfloorclipx;
 1ae:   46 18           shll8   r6
 1b0:   37 6c           add     r6,r7
 1b2:   67 7d           extu.w  r7,r7
            clipbounds[x] = newclip;
 1b4:   0c 75           mov.w   r7,@(r0,r12)
            newclipbounds[x] = newclip;
 1b6:   50 fb           mov.l   @(44,r15),r0
 1b8:   0c 75           mov.w   r7,@(r0,r12)


What I find really odd is that gcc opts to cache results of bitwise AND on the
stack and reload them individually instead of simply doing tst #imm1,r0 and tst
#imm,r0. There are more instances of the this behavior further down the same
function.

Now memory reads are really expensive on the target architecture and I would
like to avoid them if possible. I'm not sure whether this behavior is triggered
by some optimization setting or is inherent to the architecture, but I'd
appreciate any help here.

Reply via email to