Hi Jennifer,

could you next time send these out as different emails? The multiple patches as
attachments is a bit hard to reply to the relevant bits.

> From b8e2225eb288d6805cccbbfc5a46dccc0b8d61e6 Mon Sep 17 00:00:00 2001
> From: Jennifer Schmitz <[email protected]>
> Date: Fri, 22 Aug 2025 07:16:48 -0700
> Subject: [PATCH 2/2] [PR121604] aarch64: Fold svbrka/b with ptrue predicate
>  and operand to pfalse.
> 
> We can fold svbrka/b to pfalse, if the governing predicate and the operand
> are both ptrue.
> We implemented this optimization during gimple folding in
> svbrk_unary_impl::fold.
> 

Only BRKB can fold to pfalse here. BRKA will include the first element.
BRKA can however fold to ptrue pN, vl1. So the BRKA folding here is incorrect.

GCC Seems to miss most valid foldings of BRK[AB] though.  In case you wanted 
some
examples I wrote some below including the values that need to be active for it
to fold. Not saying these are required to be fixed by you, just pointing them 
out.

#include <arm_sve.h>

// should fold to return b,
__attribute__ ((noipa))
svbool_t f1m (svbool_t a, svbool_t b)
{
    return svbrka_b_m (b, svpfalse_b(), a);
}

// should fold to pfalse
__attribute__ ((noipa))
svbool_t f1z (svbool_t a)
{
    return svbrka_b_z (svpfalse_b(), a);
}

// should not fold
__attribute__ ((noipa))
svbool_t f2m (svbool_t a, svbool_t b)
{
    return svbrka_b_m (b, svptrue_b8(), a);
}

// should not fold
__attribute__ ((noipa))
svbool_t f2z (svbool_t a)
{
    return svbrka_b_z (svptrue_b8(), a);
}

// should fold to pfirst
__attribute__ ((noipa))
svbool_t f3z (svbool_t a)
{
    return svbrka_b_z (svptrue_b8(), svptrue_b8());
}

// don't fold
__attribute__ ((noipa))
svbool_t f3m (svbool_t a, svbool_t b)
{
    return svbrka_b_m (a, svptrue_b8(), a);
}

// should fold to pfirst
__attribute__ ((noipa))
svbool_t f4m (svbool_t a)
{
    return svbrka_b_m (a, svptrue_b8(), svptrue_b8 ());
}

// should fold to ptrue
__attribute__ ((noipa))
svbool_t f4z ()
{
    return svbrka_b_z (svptrue_b8(), svpfalse());
}

// fold to pfalse
__attribute__ ((noipa))
svbool_t g1z (svbool_t a)
{
    return svbrkb_b_z (svpfalse_b(), a);
}

// don't fold
__attribute__ ((noipa))
svbool_t g2m (svbool_t a)
{
    return svbrkb_b_m (svpfalse_b(), svptrue_b8(), a);
}

// don't fold
__attribute__ ((noipa))
svbool_t g2z (svbool_t a)
{
    return svbrkb_b_z (svptrue_b8(), a);
}

// should fold to ptrue
__attribute__ ((noipa))
svbool_t g3m (svbool_t a)
{
    return svbrkb_b_m (a, svptrue_b8(), svpfalse ());
}

// fold to pfalse
__attribute__ ((noipa))
svbool_t g4m (svbool_t a)
{
    return svbrkb_b_m (a, svptrue_b8(), svptrue_b8 ());
}

__attribute__ ((noipa))
void check_result (svbool_t a, svbool_t b)
{
  svbool_t pg = svptrue_b8();
  svbool_t diff = svorr_b_z(pg,
                            svand_b_z(pg, a, svnot_b_z(pg, b)),
                            svand_b_z(pg, b, svnot_b_z(pg, a)));
    if (svcntp_b8(pg, diff) != 0)
      __builtin_abort ();
}

int main ()
{
    svbool_t a = svptrue_pat_b16 (SV_VL4);
    svbool_t b = svptrue_pat_b16 (SV_VL5);

    // Should fold to second argument
    check_result (f1m (a, b), b);
    check_result (f1m (b, svpfalse ()), svpfalse ());

    // should fold to false
    check_result (f1z (a), svpfalse ());
    check_result (f1z (b), svpfalse ());

    // should fold to pfirst
    check_result (f3z (a), svptrue_pat_b16 (SV_VL1));
    check_result (f3z (b), svptrue_pat_b16 (SV_VL1));
    check_result (f3z (svptrue_b8 ()), svptrue_pat_b16 (SV_VL1));
    check_result (f3z (svpfalse ()), svptrue_pat_b16 (SV_VL1));

    // should fold to pfirst
    check_result (f4m (a), svptrue_pat_b16 (SV_VL1));
    check_result (f4m (b), svptrue_pat_b16 (SV_VL1));
    check_result (f4m (svptrue_b8 ()), svptrue_pat_b16 (SV_VL1));
    check_result (f4m (svpfalse ()), svptrue_pat_b16 (SV_VL1));

    // should fold to ptrue
    check_result (f4z (), svptrue_b8 ());

    // should fold to pfalse
    check_result (g1z (a), svpfalse ());
    check_result (g1z (b), svpfalse ());
    check_result (g1z (svptrue_b8 ()), svpfalse ());
    check_result (g1z (svpfalse ()), svpfalse ());

    // Should fold to ptrue
    check_result (g3m (a), svptrue_b8 ());
    check_result (g3m (b), svptrue_b8 ());
    check_result (g3m (svptrue_b8 ()), svptrue_b8 ());
    check_result (g3m (svpfalse ()), svptrue_b8 ());

    // should fold to pfalse
    check_result (g4m (a), svpfalse ());
    check_result (g4m (b), svpfalse ());
    check_result (g4m (svptrue_b8 ()), svpfalse ());
    check_result (g4m (svpfalse ()), svpfalse ());
}

> the case of m predication. The reason is that if get_gp_arg_index returns the
> “normal” position of the governing predicate (i.e. for x or z predication), we
> still have to adjust the position in case of m predication for a few unary 
> shapes.
> And that brings us back to needing case distinctions for specific shapes in
> gimple_folder::fold_pfalse, which Alex wanted to avoid (if I understood 
> correctly).
> What do you think?

Yeah I wanted to avoid changes to the shape class since the comments around that
area indicate that it was an intentional decision not to store the location of
the GP in the shape.  Hence my suggestion to store this at a higher level.

However I spent some time trying to understand how the parsing works, and
reading more into how the intrinsics framework is structured, I think we
actually do know the location of the GP but we don't store it.

During apply_predication we do adjust the index based on where the predicate
could be, so the shape does know where the GP is supposed to be.

So instead how about the attached approach?  I've tested it on a few of the
cases and it gives the expected results and is a much cleaner approach I think.

What do you think of going that way instead?

Thanks,
Tamar

Attachment: rb19899.patch
Description: rb19899.patch

Reply via email to