On 9 May 2012 11:18, Christophe Lyon <christophe.l...@st.com> wrote:
> Hello,
>
> On ARM+Neon, the expansion of vld1q_dup_s64() and vld1q_dup_u64() builtins
> currently fails to load the second vector element.

Thanks for the patch but this is not acceptable as it stands today.
You need to set the length attributes in this case to 8 for the
appropriate alternative at the very least. You also don't mention how
this patch was tested. Alternatively it might be worth splitting the
vld1q_*64 case into a 64 bit load into a (subreg:DI (V2DI reg)  0 )
followed by a subreg to subreg move which should end up having the
same effect . That splitting would allow for better instruction
scheduling. In addition it would be nice to have a testcase in
gcc.target/arm .

As a follow up patch I'd like these patterns merged with the vdup_n
patterns in neon.md (allowing them to grow a memory operand variant)
which should then allow merging of (I think)

scalarval = scalar_load ()
vreg = vdup ( scalarval)

into

vreg = vld1_dup_n ( scalar_address).

Thanks,
Ramana

Reply via email to