> On 2 May 2025, at 10:05, Richard Sandiford <richard.sandif...@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz <jschm...@nvidia.com> writes:
>> I adjusted the patch to use && !val.is_constant () as check, such that the 
>> extraction
>> of the last element is matched by other patterns for VLS of all vector 
>> widths.
>> 256-bit and 512-bit VLS are now matched by *vec_extract<mode><Vel>_dup
>> and 1024-bit and 2048-bit by *vec_extract<mode><Vel>_ext.
>> There were existing tests for 256-bit, 512-bit, 1024-bit and 2048-bit that 
>> could be adjusted.
>> 128-bit VLS is covered by the new test in the patch. Let me know if I shall 
>> change it to be
>> more analogous to the other extract_*.c tests.
>> 
>> About implementing point (2): Do we only want the REV approach you outlined 
>> for VLA,
>> now that VLS is redirected to other patterns?
> 
> With VLS, we want to use the non-REV path for anything other than the
> last element.  We also want to use the non-REV path for 128-bit SVE.
> For 256-bit and 512-bit SVE, either the REV path or the DUP path should
> be ok on Arm uarches.  For 1024-bit SVE and 2048-bit SVE, there's the
> wrinkle that the VLS path (using EXT) requires tied operands, so there
> might be an argument that using REV is better for the last element.
> But it's hard to speculate what a 1024-bit SVE uarch would look like.
> 
> So I think for now it's simpler to stick to using REV for the VLA cases.
> We can revisit later if there's a specific reason to do something
> more complex.
Sounds good, I will write a patch implementing the REV approach for VLA.
> 
>> [...]
>> /* Also used to move the result of a non-Advanced SIMD extract.  */
>> -/* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.b\[0\]\n} 2 
>> } } */
>> +/* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.b\[0\]\n} 3 
>> } } */
>> /* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.b\[1\]\n} 1 
>> } } */
>> /* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.b\[15\]\n} 1 
>> } } */
>> /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.b, z[0-9]+\.b\[16\]\n} 
>> 1 } } */
>> -/* { dg-final { scan-assembler-times {\tlastb\tw[0-9]+, p[0-7], 
>> z[0-9]+\.b\n} 1 } } */
> 
> It looks like this should be replaced by a DUP .B[31], rather than
> just removed.
Thanks for spotting that. I pushed the patch to trunk: 
cdfa963cfc6849ff3ceb911f293201882aeef22e.
Best,
Jennifer
> 
> LGTM otherwise, so ok with that change.
> 
> Thanks,
> Richard

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to