> On 2 May 2025, at 10:05, Richard Sandiford <richard.sandif...@arm.com> wrote: > > External email: Use caution opening links or attachments > > > Jennifer Schmitz <jschm...@nvidia.com> writes: >> I adjusted the patch to use && !val.is_constant () as check, such that the >> extraction >> of the last element is matched by other patterns for VLS of all vector >> widths. >> 256-bit and 512-bit VLS are now matched by *vec_extract<mode><Vel>_dup >> and 1024-bit and 2048-bit by *vec_extract<mode><Vel>_ext. >> There were existing tests for 256-bit, 512-bit, 1024-bit and 2048-bit that >> could be adjusted. >> 128-bit VLS is covered by the new test in the patch. Let me know if I shall >> change it to be >> more analogous to the other extract_*.c tests. >> >> About implementing point (2): Do we only want the REV approach you outlined >> for VLA, >> now that VLS is redirected to other patterns? > > With VLS, we want to use the non-REV path for anything other than the > last element. We also want to use the non-REV path for 128-bit SVE. > For 256-bit and 512-bit SVE, either the REV path or the DUP path should > be ok on Arm uarches. For 1024-bit SVE and 2048-bit SVE, there's the > wrinkle that the VLS path (using EXT) requires tied operands, so there > might be an argument that using REV is better for the last element. > But it's hard to speculate what a 1024-bit SVE uarch would look like. > > So I think for now it's simpler to stick to using REV for the VLA cases. > We can revisit later if there's a specific reason to do something > more complex. Sounds good, I will write a patch implementing the REV approach for VLA. > >> [...] >> /* Also used to move the result of a non-Advanced SIMD extract. */ >> -/* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.b\[0\]\n} 2 >> } } */ >> +/* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.b\[0\]\n} 3 >> } } */ >> /* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.b\[1\]\n} 1 >> } } */ >> /* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.b\[15\]\n} 1 >> } } */ >> /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.b, z[0-9]+\.b\[16\]\n} >> 1 } } */ >> -/* { dg-final { scan-assembler-times {\tlastb\tw[0-9]+, p[0-7], >> z[0-9]+\.b\n} 1 } } */ > > It looks like this should be replaced by a DUP .B[31], rather than > just removed. Thanks for spotting that. I pushed the patch to trunk: cdfa963cfc6849ff3ceb911f293201882aeef22e. Best, Jennifer > > LGTM otherwise, so ok with that change. > > Thanks, > Richard
smime.p7s
Description: S/MIME cryptographic signature