On Fri, 7 Dec 2012, Michael Zolotukhin wrote:

1) Does the root problem lay in the fact that even for scalar
additions we perform the addition on the whole vector and only then
drop the higher parts of the vector? I.e. to fix the test from the PR
we need to replace plus on vector mode with plus on scalar mode?

The root problem is that we model the subs[sd] instructions as taking a 128-bit second operand, when Intel's documentation says they take a 32/64-bit operand, which is an important difference for memory operands (and constants). Writing a pattern that reconstructs the result from a scalar operation also seems more natural than pretending we are doing a parallel operation and dropping most of it (easier for recog and friends).

(note: I think the insn was written to support the intrinsic, which does take a 128-bit argument, so it did a good job for that)

2) Is one of the main requirements having the same pattern for V4SF
and V2DF version?

Uros seems to think that would be best.

3) I don't see vec_concat in patterns from your patches, is it
explicitly generated by some x86-expander?

It is generated by ix86_expand_vector_set.

Anyway, I really like the idea of having some uniformity in describing
patterns for scalar instructions, so thank you for the work!

For 2-element vectors, vec_concat does seem more natural than vec_merge. If we chose vec_merge as the canonical representation, we should chose it for setting an element in a vector (ix86_expand_vector_set) everywhere, not just those scalarish operations.

So it would be good to have rth's opinion on this (svn blame seems to indicate he is the one who chose to use vec_concat specifically for V2DF instead of vec_merge).

--
Marc Glisse

Reply via email to