Hi Juzhe,
the general method seems sane and useful (it's not very complicated).
I was just distracted by
> Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the
> common expression:
> { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ... }
>
> For this selector, we can use vmsltu + vmerge to optimize the codegen.
because it's actually { 0, nunits + 1, 2, nunits + 3, ... } or maybe
{ 0, nunits, 0, nunits, ... } + { 0, 1, 2, 3, ..., nunits - 1 }.
Because of the ascending/monotonic? selector structure we can use vmerge
instead of vrgather.
> +/* Recognize the patterns that we can use merge operation to shuffle the
> + vectors. The value of Each element (index i) in selector can only be
> + either i or nunits + i.
> +
> + E.g.
> + v = VEC_PERM_EXPR (v0, v1, selector),
> + selector = { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ... }
Same.
> +
> + We can transform such pattern into:
> +
> + v = vcond_mask (v0, v1, mask),
> + mask = { 0, 1, 0, 1, 0, 1, ... }. */
> +
> +static bool
> +shuffle_merge_patterns (struct expand_vec_perm_d *d)
> +{
> + machine_mode vmode = d->vmode;
> + machine_mode sel_mode = related_int_vector_mode (vmode).require ();
> + int n_patterns = d->perm.encoding ().npatterns ();
> + poly_int64 vec_len = d->perm.length ();
> +
> + for (int i = 0; i < n_patterns; ++i)
> + if (!known_eq (d->perm[i], i) && !known_eq (d->perm[i], vec_len + i))
> + return false;
> +
> + for (int i = n_patterns; i < n_patterns * 2; i++)
> + if (!d->perm.series_p (i, n_patterns, i, n_patterns)
> + && !d->perm.series_p (i, n_patterns, vec_len + i, n_patterns))
> + return false;
Maybe add a comment that we check that the pattern is actually monotonic
or however you prefet to call it?
I didn't go through all tests in detail but skimmed several. All in all
looks good to me.
Regards
Robin