https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107096

--- Comment #13 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to rguent...@suse.de from comment #12)
> On Wed, 15 Feb 2023, crazylht at gmail dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107096
> > 
> > --- Comment #11 from Hongtao.liu <crazylht at gmail dot com> ---
> > 
> > > 
> > > There's no other way to do N bit to two N/2 bit hi/lo (un)packing
> > > (there's a 2x N/2 bit -> N bit operation, for whatever reason).
> > > There's also no way to transform the d rgroup mask into the
> > > f rgroup mask for the first example aka duplicate bits in place,
> > > { b0, b1, b2, ... bN } -> { b0, b0, b1, b1, b2, b2, ... bN, bN },
> > > nor the reverse.
> > > 
> > 
> > Can we just do VIEW_CONVERT_EXPR for vectype instead of mask_type.
> > .i.e
> > we can do VCE to tranform V8SI to V16HI, then use mask_load for V16HI with 
> > same
> > mask {b0, b0, b1, b1, b2, b2, .}, then VCE it to back to V8SI, it should be 
> > ok
> > as long as duplicated bits in place.(or VCE V16HI to V8SI then use mask {b0,
> > b1, b2, ..., bN}, and VCE V8SI back to V16HI after masked load/move).
> 
> Hmm, yes, if we arrange for the larger mask to be available that would
> work for loads and stores I guess.  It wouldn't work for arithmetic
> cond_* IFNs though.  It's also going to be a bit tricky within the
> masking framework - I'm going to see whether that works though, it might
> be a nice way to avoid an excessive number of masks for integer code
> at least.

There could be some limitation for nV(it should be power of 2 for VCE?)
.i.e.
There's no suitable vectype for VCE of src1 vectype to resure loop mask.
void
foo (int* __restrict dest, int* src1, int* src2)
{
    for (int i = 0; i != 10000; i++)
      dest[i] = src1[3*i] + src1[3*i + 1] + src1[3*i + 2];
}


Maybe AVX512 could use gather instruction for .MASK_LOAD_LANES to use
LOOP_MASK?

Reply via email to