Jennifer Schmitz <jschm...@nvidia.com> writes:
> I adjusted the patch to use && !val.is_constant () as check, such that the 
> extraction
> of the last element is matched by other patterns for VLS of all vector widths.
> 256-bit and 512-bit VLS are now matched by *vec_extract<mode><Vel>_dup
> and 1024-bit and 2048-bit by *vec_extract<mode><Vel>_ext.
> There were existing tests for 256-bit, 512-bit, 1024-bit and 2048-bit that 
> could be adjusted.
> 128-bit VLS is covered by the new test in the patch. Let me know if I shall 
> change it to be
> more analogous to the other extract_*.c tests.
>
> About implementing point (2): Do we only want the REV approach you outlined 
> for VLA,
> now that VLS is redirected to other patterns?

With VLS, we want to use the non-REV path for anything other than the
last element.  We also want to use the non-REV path for 128-bit SVE.
For 256-bit and 512-bit SVE, either the REV path or the DUP path should
be ok on Arm uarches.  For 1024-bit SVE and 2048-bit SVE, there's the
wrinkle that the VLS path (using EXT) requires tied operands, so there
might be an argument that using REV is better for the last element.
But it's hard to speculate what a 1024-bit SVE uarch would look like.

So I think for now it's simpler to stick to using REV for the VLA cases.
We can revisit later if there's a specific reason to do something
more complex.

> [...]
>  /* Also used to move the result of a non-Advanced SIMD extract.  */
> -/* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.b\[0\]\n} 2 
> } } */
> +/* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.b\[0\]\n} 3 
> } } */
>  /* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.b\[1\]\n} 1 
> } } */
>  /* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.b\[15\]\n} 1 
> } } */
>  /* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.b, z[0-9]+\.b\[16\]\n} 
> 1 } } */
> -/* { dg-final { scan-assembler-times {\tlastb\tw[0-9]+, p[0-7], 
> z[0-9]+\.b\n} 1 } } */

It looks like this should be replaced by a DUP .B[31], rather than
just removed.

LGTM otherwise, so ok with that change.

Thanks,
Richard

Reply via email to