https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89865

            Bug ID: 89865
           Summary: [9 Regression] FAIL: gcc.target/i386/pr49095.c
                    scan-assembler-times \\\\), % 45
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ubizjak at gmail dot com
  Target Milestone: ---

There are two issues at play that interfere with expected number of
scan-assembler-times expression. Please consider this testcase, simplified from
gcc.target/i386/pr49095.c:

char *
hcharplus (char *x)
{
  *x += 24;
  if (!*x)
    foo (x);
  return x;
}

current gcc trunk generates (-Os -fno-shrink-wrap -mregparm=2 -m32):

hcharplus:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $24, %esp
        movb    (%eax), %cl
        leal    24(%ecx), %edx
        movb    %dl, (%eax)
        testb   %dl, %dl
        jne     .L7
        movl    %eax, -12(%ebp)
        call    foo
        movl    -12(%ebp), %eax
.L7:
        leave
        ret

Please note the sequence:

        movb    (%eax), %cl
        leal    24(%ecx), %edx
        movb    %dl, (%eax)
        testb   %dl, %dl

which is expected to be handled by the following peephole2 pattern:

;; Likewise for instances where we have a lea pattern.
(define_peephole2
  [(set (match_operand:SWI 0 "register_operand")
        (match_operand:SWI 1 "memory_operand"))
   (set (match_operand:SWI 3 "register_operand")
        (plus:SWI (match_dup 0)
                  (match_operand:SWI 2 "<nonmemory_operand>")))
   (set (match_dup 1) (match_dup 3))
   (set (reg FLAGS_REG) (compare (match_dup 3) (const_int 0)))]
  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
   && peep2_reg_dead_p (4, operands[3])
   && (rtx_equal_p (operands[0], operands[3])
       || peep2_reg_dead_p (2, operands[0]))
   && !reg_overlap_mentioned_p (operands[0], operands[1])
   && !reg_overlap_mentioned_p (operands[3], operands[1])
   && !reg_overlap_mentioned_p (operands[0], operands[2])
   && (<MODE>mode != QImode
       || immediate_operand (operands[2], QImode)
       || any_QIreg_operand (operands[2], QImode))
   && ix86_match_ccmode (peep2_next_insn (3), CCGOCmode)"
  [(parallel [(set (match_dup 4) (match_dup 6))
              (set (match_dup 1) (match_dup 5))])]
{
  operands[4] = SET_DEST (PATTERN (peep2_next_insn (3)));
  operands[5]
    = gen_rtx_PLUS (<MODE>mode,
                    copy_rtx (operands[1]),
                    operands[2]);
  operands[6]
    = gen_rtx_COMPARE (GET_MODE (operands[4]),
                       copy_rtx (operands[5]),
                       const0_rtx);
})

However, the above pattern does not look for correct mode of the LEA insn and
doesn't take into account that input and output register can differ for LEA.

We have the following sequence before peephole2 pass:

(insn 25 6 28 2 (set (reg:QI 2 cx [91])
        (mem:QI (reg/v/f:SI 0 ax [orig:87 x ] [87]) [0 *x_7(D)+0 S1 A8]))
"ra.c":24:6 69 {*movqi_internal}
     (nil))
(insn 28 25 8 2 (set (reg:SI 1 dx [orig:85 _4 ] [85])
        (plus:SI (reg:SI 2 cx [91])
            (const_int 24 [0x18]))) "ra.c":24:6 186 {*leasi}
     (expr_list:REG_DEAD (reg:SI 2 cx [91])
        (nil)))
(insn 8 28 9 2 (set (mem:QI (reg/v/f:SI 0 ax [orig:87 x ] [87]) [0 *x_7(D)+0 S1
A8])
        (reg:QI 1 dx [orig:85 _4 ] [85])) "ra.c":24:6 69 {*movqi_internal}
     (nil))
(insn 9 8 10 2 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:QI 1 dx [orig:85 _4 ] [85])
            (const_int 0 [0]))) "ra.c":25:6 5 {*cmpqi_ccno_1}
     (expr_list:REG_DEAD (reg:QI 1 dx [orig:85 _4 ] [85])
        (nil)))

From the above sequence, it can be seen that the mode of LEA insn in the
peephole2 pattern should use LEAMODE mode attribute instead of SWI mode
iterator. Also, the regno of (insn 28) output reg should only match the regno
of the output of (insn 25), with regno of (insn 28) matching regno of (insn 8)
and (insn 9).

The other issue with pr49095.c test is, that we now spill call-used register
around the call:

        movl    %eax, -12(%ebp)
        call    foo
        movl    -12(%ebp), %eax

where gcc-8 used call-preserved register to save the value around the call:

        movl    %eax, %ebx
        call    foo
        movl    %ebx, %eax

However, the above approach requires call-preserved register %ebx to be saved
in the callee function, so the new approach saves a push/pop pair. In any case,
the new assembly changes the result of the scan-assembler-times dg directive,
as

        movl    -12(%ebp), %eax

triggers the scan-assembler-times regexp.

Reply via email to