Hi Jennifer, could you next time send these out as different emails? The multiple patches as attachments is a bit hard to reply to the relevant bits.
> From b8e2225eb288d6805cccbbfc5a46dccc0b8d61e6 Mon Sep 17 00:00:00 2001 > From: Jennifer Schmitz <[email protected]> > Date: Fri, 22 Aug 2025 07:16:48 -0700 > Subject: [PATCH 2/2] [PR121604] aarch64: Fold svbrka/b with ptrue predicate > and operand to pfalse. > > We can fold svbrka/b to pfalse, if the governing predicate and the operand > are both ptrue. > We implemented this optimization during gimple folding in > svbrk_unary_impl::fold. > Only BRKB can fold to pfalse here. BRKA will include the first element. BRKA can however fold to ptrue pN, vl1. So the BRKA folding here is incorrect. GCC Seems to miss most valid foldings of BRK[AB] though. In case you wanted some examples I wrote some below including the values that need to be active for it to fold. Not saying these are required to be fixed by you, just pointing them out. #include <arm_sve.h> // should fold to return b, __attribute__ ((noipa)) svbool_t f1m (svbool_t a, svbool_t b) { return svbrka_b_m (b, svpfalse_b(), a); } // should fold to pfalse __attribute__ ((noipa)) svbool_t f1z (svbool_t a) { return svbrka_b_z (svpfalse_b(), a); } // should not fold __attribute__ ((noipa)) svbool_t f2m (svbool_t a, svbool_t b) { return svbrka_b_m (b, svptrue_b8(), a); } // should not fold __attribute__ ((noipa)) svbool_t f2z (svbool_t a) { return svbrka_b_z (svptrue_b8(), a); } // should fold to pfirst __attribute__ ((noipa)) svbool_t f3z (svbool_t a) { return svbrka_b_z (svptrue_b8(), svptrue_b8()); } // don't fold __attribute__ ((noipa)) svbool_t f3m (svbool_t a, svbool_t b) { return svbrka_b_m (a, svptrue_b8(), a); } // should fold to pfirst __attribute__ ((noipa)) svbool_t f4m (svbool_t a) { return svbrka_b_m (a, svptrue_b8(), svptrue_b8 ()); } // should fold to ptrue __attribute__ ((noipa)) svbool_t f4z () { return svbrka_b_z (svptrue_b8(), svpfalse()); } // fold to pfalse __attribute__ ((noipa)) svbool_t g1z (svbool_t a) { return svbrkb_b_z (svpfalse_b(), a); } // don't fold __attribute__ ((noipa)) svbool_t g2m (svbool_t a) { return svbrkb_b_m (svpfalse_b(), svptrue_b8(), a); } // don't fold __attribute__ ((noipa)) svbool_t g2z (svbool_t a) { return svbrkb_b_z (svptrue_b8(), a); } // should fold to ptrue __attribute__ ((noipa)) svbool_t g3m (svbool_t a) { return svbrkb_b_m (a, svptrue_b8(), svpfalse ()); } // fold to pfalse __attribute__ ((noipa)) svbool_t g4m (svbool_t a) { return svbrkb_b_m (a, svptrue_b8(), svptrue_b8 ()); } __attribute__ ((noipa)) void check_result (svbool_t a, svbool_t b) { svbool_t pg = svptrue_b8(); svbool_t diff = svorr_b_z(pg, svand_b_z(pg, a, svnot_b_z(pg, b)), svand_b_z(pg, b, svnot_b_z(pg, a))); if (svcntp_b8(pg, diff) != 0) __builtin_abort (); } int main () { svbool_t a = svptrue_pat_b16 (SV_VL4); svbool_t b = svptrue_pat_b16 (SV_VL5); // Should fold to second argument check_result (f1m (a, b), b); check_result (f1m (b, svpfalse ()), svpfalse ()); // should fold to false check_result (f1z (a), svpfalse ()); check_result (f1z (b), svpfalse ()); // should fold to pfirst check_result (f3z (a), svptrue_pat_b16 (SV_VL1)); check_result (f3z (b), svptrue_pat_b16 (SV_VL1)); check_result (f3z (svptrue_b8 ()), svptrue_pat_b16 (SV_VL1)); check_result (f3z (svpfalse ()), svptrue_pat_b16 (SV_VL1)); // should fold to pfirst check_result (f4m (a), svptrue_pat_b16 (SV_VL1)); check_result (f4m (b), svptrue_pat_b16 (SV_VL1)); check_result (f4m (svptrue_b8 ()), svptrue_pat_b16 (SV_VL1)); check_result (f4m (svpfalse ()), svptrue_pat_b16 (SV_VL1)); // should fold to ptrue check_result (f4z (), svptrue_b8 ()); // should fold to pfalse check_result (g1z (a), svpfalse ()); check_result (g1z (b), svpfalse ()); check_result (g1z (svptrue_b8 ()), svpfalse ()); check_result (g1z (svpfalse ()), svpfalse ()); // Should fold to ptrue check_result (g3m (a), svptrue_b8 ()); check_result (g3m (b), svptrue_b8 ()); check_result (g3m (svptrue_b8 ()), svptrue_b8 ()); check_result (g3m (svpfalse ()), svptrue_b8 ()); // should fold to pfalse check_result (g4m (a), svpfalse ()); check_result (g4m (b), svpfalse ()); check_result (g4m (svptrue_b8 ()), svpfalse ()); check_result (g4m (svpfalse ()), svpfalse ()); } > the case of m predication. The reason is that if get_gp_arg_index returns the > “normal” position of the governing predicate (i.e. for x or z predication), we > still have to adjust the position in case of m predication for a few unary > shapes. > And that brings us back to needing case distinctions for specific shapes in > gimple_folder::fold_pfalse, which Alex wanted to avoid (if I understood > correctly). > What do you think? Yeah I wanted to avoid changes to the shape class since the comments around that area indicate that it was an intentional decision not to store the location of the GP in the shape. Hence my suggestion to store this at a higher level. However I spent some time trying to understand how the parsing works, and reading more into how the intrinsics framework is structured, I think we actually do know the location of the GP but we don't store it. During apply_predication we do adjust the index based on where the predicate could be, so the shape does know where the GP is supposed to be. So instead how about the attached approach? I've tested it on a few of the cases and it gives the expected results and is a much cleaner approach I think. What do you think of going that way instead? Thanks, Tamar
rb19899.patch
Description: rb19899.patch
