On 5/21/2026 9:40 PM, Andrew Pinski wrote:
On Thu, May 21, 2026 at 8:31 PM Jeffrey Law
<[email protected]> wrote:


On 5/18/2026 9:39 PM, [email protected] wrote:
Dear contributor,

Our automatic CI has detected problems related to your patch(es). Please find 
some details below.

In  aarch64 native, after:
    | commit gcc-17-577-gbc19036af435
    | Author: Jeff Law <[email protected]>
    | Date:   Mon May 18 15:17:27 2026 -0600
    |
    |     [RISC-V] Improve ext-dce's live bit tracking for IOR/AND with a 
constant argument
    |
    |     Investigation of a regression with some RISC-V target changes exposed 
a clear
    |     missed optimization in ext-dce.c
    |
    | ... 67 lines of the commit log omitted.

Produces 2 regressions:
    |
    | regressions.sum:
    | Running gcc:gcc.target/aarch64/aarch64.exp ...
    | FAIL: gcc.target/aarch64/tbz_1.c check-function-bodies g1
    | FAIL: gcc.target/aarch64/tbz_1.c check-function-bodies g2
I've reproduced this locally.  I haven't gotten into the debug cycle
yet, but wanted folks to know it's mine AFAICT to avoid duplicating
debugging efforts.
I think the fix is just a testcase fix.
Currently the testcase has:
**      tbnz    w[0-9]+, #?0, .L([0-9]+)

But that should just can be:
**      tbnz    [wx][0-9]+, #?0, .L([0-9]+)

to match both x0 and w0 there. But are valid in this case with bit 0.
For both g1 and g2.
Yea, so I wanted to verify this was behaving per expectation.  It is.  We have this in g1:

(insn 2 4 3 2 (set (reg/v:SI 101 [ x ])
        (zero_extend:SI (reg:QI 0 x0 [ x ]))) "j.c":16:1 149 {*zero_extendqisi2_aarch64}
     (expr_list:REG_DEAD (reg:QI 0 x0 [ x ])
        (nil)))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:SI 102)
        (and:SI (reg/v:SI 101 [ x ])
            (const_int 1 [0x1]))) "j.c":17:6 discrim 1 551 {andsi3}
     (expr_list:REG_DEAD (reg/v:SI 101 [ x ])
        (nil)))

We can trivially see that the zero extension is pointless because the value is masked with 0x1 immediately thereafter.  So ext-dce replaces the zero_extend with a subreg which then simplifies the sequence into:

(insn 2 4 3 2 (set (reg/v:SI 101 [ x ])
        (reg:SI 0 x0 [ x ])) "j.c":16:1 104 {*movsi_aarch64}
     (expr_list:REG_DEAD (reg:QI 0 x0 [ x ])
        (nil)))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:SI 102)
        (and:SI (reg/v:SI 101 [ x ])
            (const_int 1 [0x1]))) "j.c":17:6 discrim 1 551 {andsi3}
     (expr_list:REG_DEAD (reg/v:SI 101 [ x ])
        (nil)))

Combine then turns it into:

(insn 28 4 2 2 (set (reg:SI 103 [ x ])
        (reg:SI 0 x0 [ x ])) "j.c":16:1 -1
     (expr_list:REG_DEAD (reg:SI 0 x0 [ x ])
        (nil)))
(note 2 28 3 2 NOTE_INSN_DELETED)
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(note 6 3 7 2 NOTE_INSN_DELETED)
(note 7 6 8 2 NOTE_INSN_DELETED)
(jump_insn 8 7 9 2 (parallel [
            (set (pc)
                (if_then_else (eq (zero_extract:DI (subreg:DI (reg:SI 103 [ x ]) 0)
                            (const_int 1 [0x1])
                            (const_int 0 [0]))
                        (const_int 0 [0]))
                    (label_ref:DI 14)
                    (pc)))
            (clobber (reg:CC 66 cc))
        ]) "j.c":17:6 discrim 1 76 {aarch64_tbzeqdidi}
     (expr_list:REG_UNUSED (reg:CC 66 cc)
        (expr_list:REG_DEAD (reg:SI 103 [ x ])
            (int_list:REG_BR_PROB 966367641 (nil))))

Without the recent ext-dce improvement we get this out of combine:

(jump_insn 8 7 9 2 (parallel [
            (set (pc)
                (if_then_else (eq (zero_extract:SI (reg:SI 0 x0 [ x ])
                            (const_int 1 [0x1])
                            (const_int 0 [0]))
                        (const_int 0 [0]))
                    (label_ref:DI 14)
                    (pc)))
            (clobber (reg:CC 66 cc))
        ]) "k.c":17:6 discrim 1 47 {aarch64_tbeqsisi}
     (expr_list:REG_UNUSED (reg:CC 66 cc)
        (expr_list:REG_DEAD (reg:QI 0 x0 [ x ])
            (int_list:REG_BR_PROB 966367644 (nil))))

Note the difference in zero_extract's mode within the bit-test insn.  DI vs SI.  Hence the w vs x difference in the resulting assembly code.

Attached is the commited change to the testsuite.

Jeff

commit 2ec684467cbc610348ebd7b71e01425629b82f73
Author: Jeff Law <[email protected]>
Date:   Fri May 22 09:21:31 2026 -0600

    Re: [Linaro-TCWG-CI] gcc-17-577-gbc19036af435: 2 regressions on aarch64
    
    >>> Produces 2 regressions:
    >>>    |
    >>>    | regressions.sum:
    >>>    | Running gcc:gcc.target/aarch64/aarch64.exp ...
    >>>    | FAIL: gcc.target/aarch64/tbz_1.c check-function-bodies g1
    >>>    | FAIL: gcc.target/aarch64/tbz_1.c check-function-bodies g2
    >> I've reproduced this locally.  I haven't gotten into the debug cycle
    >> yet, but wanted folks to know it's mine AFAICT to avoid duplicating
    >> debugging efforts.
    >
    > I think the fix is just a testcase fix.
    > Currently the testcase has:
    > **      tbnz    w[0-9]+, #?0, .L([0-9]+)
    >
    > But that should just can be:
    > **      tbnz    [wx][0-9]+, #?0, .L([0-9]+)
    >
    > to match both x0 and w0 there. But are valid in this case with bit 0.
    > For both g1 and g2.
    Yea, so I wanted to verify this was behaving per expectation.  It is.  We 
have
    this in g1:
    
    > (insn 2 4 3 2 (set (reg/v:SI 101 [ x ])
    >         (zero_extend:SI (reg:QI 0 x0 [ x ]))) "j.c":16:1 149 
{*zero_extendqisi2_aarch64}
    >      (expr_list:REG_DEAD (reg:QI 0 x0 [ x ])
    >         (nil)))
    > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
    > (insn 6 3 7 2 (set (reg:SI 102)
    >         (and:SI (reg/v:SI 101 [ x ])
    >             (const_int 1 [0x1]))) "j.c":17:6 discrim 1 551 {andsi3}
    >      (expr_list:REG_DEAD (reg/v:SI 101 [ x ])
    >         (nil)))
    
    We can trivially see that the zero extension is pointless because the value 
is
    masked with 0x1 immediately thereafter.  So ext-dce replaces the zero_extend
    with a subreg which then simplifies the sequence into:
    
    > (insn 2 4 3 2 (set (reg/v:SI 101 [ x ])
    >         (reg:SI 0 x0 [ x ])) "j.c":16:1 104 {*movsi_aarch64}
    >      (expr_list:REG_DEAD (reg:QI 0 x0 [ x ])
    >         (nil)))
    > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
    > (insn 6 3 7 2 (set (reg:SI 102)
    >         (and:SI (reg/v:SI 101 [ x ])
    >             (const_int 1 [0x1]))) "j.c":17:6 discrim 1 551 {andsi3}
    >      (expr_list:REG_DEAD (reg/v:SI 101 [ x ])
    >         (nil)))
    
    Combine then turns it into:
    
    > (insn 28 4 2 2 (set (reg:SI 103 [ x ])
    >         (reg:SI 0 x0 [ x ])) "j.c":16:1 -1
    >      (expr_list:REG_DEAD (reg:SI 0 x0 [ x ])
    >         (nil)))
    > (note 2 28 3 2 NOTE_INSN_DELETED)
    > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
    > (note 6 3 7 2 NOTE_INSN_DELETED)
    > (note 7 6 8 2 NOTE_INSN_DELETED)
    > (jump_insn 8 7 9 2 (parallel [
    >             (set (pc)
    >                 (if_then_else (eq (zero_extract:DI (subreg:DI (reg:SI 103 
[ x ]) 0)
    >                             (const_int 1 [0x1])
    >                             (const_int 0 [0]))
    >                         (const_int 0 [0]))
    >                     (label_ref:DI 14)
    >                     (pc)))
    >             (clobber (reg:CC 66 cc))
    >         ]) "j.c":17:6 discrim 1 76 {aarch64_tbzeqdidi}
    >      (expr_list:REG_UNUSED (reg:CC 66 cc)
    >         (expr_list:REG_DEAD (reg:SI 103 [ x ])
    >             (int_list:REG_BR_PROB 966367641 (nil))))
    
    Without the recent ext-dce improvement we get this out of combine:
    
    > (jump_insn 8 7 9 2 (parallel [
    >             (set (pc)
    >                 (if_then_else (eq (zero_extract:SI (reg:SI 0 x0 [ x ])
    >                             (const_int 1 [0x1])
    >                             (const_int 0 [0]))
    >                         (const_int 0 [0]))
    >                     (label_ref:DI 14)
    >                     (pc)))
    >             (clobber (reg:CC 66 cc))
    >         ]) "k.c":17:6 discrim 1 47 {aarch64_tbeqsisi}
    >      (expr_list:REG_UNUSED (reg:CC 66 cc)
    >         (expr_list:REG_DEAD (reg:QI 0 x0 [ x ])
    >             (int_list:REG_BR_PROB 966367644 (nil))))
    
    Note the difference in zero_extract's mode within the bit-test insn.  DI vs 
SI.
    Hence the w vs x difference in the resulting assembly code.
    
    gcc/testsuite
            * gcc.target/aarch64/tbz_1.c: Update expected output.

diff --git a/gcc/testsuite/gcc.target/aarch64/tbz_1.c 
b/gcc/testsuite/gcc.target/aarch64/tbz_1.c
index 39deb58e278..065e7f62cf8 100644
--- a/gcc/testsuite/gcc.target/aarch64/tbz_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/tbz_1.c
@@ -8,7 +8,7 @@ void h(void);
 
 /*
 ** g1:
-**     tbnz    w[0-9]+, #?0, .L([0-9]+)
+**     tbnz    [wx][0-9]+, #?0, .L([0-9]+)
 **     ret
 **     ...
 */
@@ -20,7 +20,7 @@ void g1(bool x)
 
 /*
 ** g2:
-**     tbz     w[0-9]+, #?0, .L([0-9]+)
+**     tbz     [wx][0-9]+, #?0, .L([0-9]+)
 **     b       h
 **     ...
 */

Reply via email to