Hi Surya,

On Fri, 2026-03-27 at 13:47 +0530, Surya Kumari Jangala wrote:
> Hi Avinash,
> 
> On 27/03/26 12:45 PM, Avinash Jayakar wrote:
> > Hi Surya,
> > On Fri, 2026-03-27 at 11:50 +0530, Surya Kumari Jangala wrote:
> > > Hi Avinash,
> > > 
> > > On 25/03/26 11:04 PM, Avinash Jayakar wrote:
> > > > Hi Kishan,
> > > > 
> > > > I think similar changes are needed as the RFC2657(AES) patch.
> > > > 
> > > > On Wed, 2026-03-11 at 14:08 +0530, Kishan Parmar wrote:
> > > > > Hi All,
> > > > > 
> > > > > Following patch depends on these 2 patches in the following
> > > > > order:
> > > > > 1. mcpu=future:
> > > > > https://gcc.gnu.org/pipermail/gcc-patches/2025-December/703739.html
> > > > > 2. future builtin infra:
> > > > > https://gcc.gnu.org/pipermail/gcc-patches/2026-March/709782.html
> > > > > 
> > > > > Bootstrapped and regtested on powerpc64le-linux-gnu with no
> > > > > regressions.
> > > > > 
> > > > > Changes from v1:
> > > > >       - Add missing author line:
> > > > >       2025-03-11  Kishan Parmar  <[email protected]>
> > > > > 
> > > > > Thanks and regards,
> > > > > Kishan Parmar
> > > > > 
> > > > > Add support for vector uncompress and unpack instructions
> > > > > proposed in
> > > > > RFC02691.  These instructions may or may not be added to a
> > > > > future
> > > > > Power
> > > > > processor, and the names of the builtins may change in the
> > > > > future.
> > > > > 
> > > > > The instructions are exposed through new builtins and
> > > > > intrinsics
> > > > > interfaces and are enabled when compiling with -mcpu=future.
> > > > > 
> > > > > This patch adds RTL patterns for vector uncompress (nibble,
> > > > > byte,
> > > > > and
> > > > > halfword) and unpack operations in altivec.md, along with the
> > > > > corresponding builtin definitions in rs6000-builtins.def and
> > > > > overload
> > > > > entries in rs6000-overload.def.
> > > > > 
> > > > > The following new builtins are provided:
> > > > > 
> > > > > vector unsigned short vec_uncompresshn (vector unsigned char,
> > > > > vector
> > > > > unsigned int)
> > > > You can align arguments in new line similar to c files, to fit
> > > > the
> > > > 72
> > > > char length. (As done for rfc2657)
> > > > > vector unsigned int vec_uncompresshb (vector unsigned short,
> > > > > vector
> > > > > unsigned short)
> > > > > vector unsigned long long vec_uncompresshh (vector unsigned
> > > > > int,
> > > > > vector unsigned char)
> > > > > vector unsigned short vec_uncompressln (vector unsigned char,
> > > > > vector
> > > > > unsigned int)
> > > > > vector unsigned int vec_uncompresslb (vector unsigned short,
> > > > > vector
> > > > > unsigned short)
> > > > > vector unsigned long long vec_uncompresslh (vector unsigned
> > > > > int,
> > > > > vector unsigned char)
> > > > > vector signed char vec_unpack_hsn_to_byte (vector unsigned
> > > > > long
> > > > > long)
> > > > > vector signed char vec_unpack_lsn_to_byte (vector unsigned
> > > > > long
> > > > > long)
> > > > > vector unsigned char vec_unpack_int4_to_bf16 (vector unsigned
> > > > > short,
> > > > > const int<2>)
> > > > > vector unsigned char vec_unpack_int8_to_bf16 (vector unsigned
> > > > > short,
> > > > > const int<1>)
> > > > > vector float vec_unpack_int4_to_fp32 (vector unsigned int,
> > > > > const
> > > > > int<3>)
> > > > > vector float vec_unpack_int8_to_fp32 (vector unsigned int,
> > > > > const
> > > > > int<2>)
> > > > > 
> > > > > 2025-03-11  Kishan Parmar  <[email protected]>
> > > > Update year
> > > > > 
> > > > > gcc/ChangeLog:
> > > > No new line needed
> > > > > 
> > > > >       * config/rs6000/altivec.md (altivec_vupkhsntob): New
> > > > > define_insn for
> > > > >       vupk[lh]sntob.
> > > > New define_insn should suffice.
> > > > Also need to add all UNSPEC enum values.
> > > > >       (altivec_vupklsntob): Likewise.
> > > > >       (altivec_vupkint4tobf16): New define_insn
> > > > > vupkint4tobf16.
> > > > >       (altivec_vupkint8tobf16): New define_insn
> > > > > vupkint8tobf16.
> > > > >       (altivec_vupkint4tofp32): New define_insn
> > > > > vupkint4tofp32.
> > > > >       (altivec_vupkint8tofp32): New define_insn
> > > > > vupkint8tofp32.
> > > > >       (vu_hl): New attribute.
> > > > >       (vu_lh): Likewise.
> > > > >       (vucmpr_splt_val): Likewise.
> > > > >       (VUCMPR_N): New int iterator.
> > > > >       (VUCMPR_B): Likewise.
> > > > >       (VUCMPR_H): Likewise.
> > > > >       (altivec_vucmpr<vu_hl>n): New define_expand.
> > > > >       (altivec_vucmpr<vu_hl>n_direct): New define_insn for
> > > > > vucmpr<vu_hl>n.
> > > > >       (altivec_vucmpr<vu_hl>b): New define_expand.
> > > > >       (altivec_vucmpr<vu_hl>b_direct): New define_insn for
> > > > > vucmpr<vu_hl>b.
> > > > >       (altivec_vucmpr<vu_hl>h): New define_expand.
> > > > >       (altivec_vucmpr<vu_hl>h_direct): New define_insn for
> > > > > vucmpr<vu_hl>h.
> > > > >       * config/rs6000/rs6000-builtins.def: Add vector
> > > > > uncompress
> > > > > and unpack
> > > > >       builtins under [future].
> > > > >       * config/rs6000/rs6000-overload.def: Add
> > > > > vec_uncompress*
> > > > > and
> > > > > vec_unpack*
> > > > >       interfaces.
> > > > > 
> > > > > gcc/testsuite/ChangeLog:
> > > > Likewise
> > > > > 
> > > > >       * gcc.target/powerpc/future-vucmpr.c: New test.
> > > > >       * gcc.target/powerpc/future-vupk.c: New test.
> > > > > ---
> > > > >  gcc/config/rs6000/altivec.md                  | 182
> > > > > ++++++++++++++++++
> > > > >  gcc/config/rs6000/rs6000-builtins.def         |  38 ++++
> > > > >  gcc/config/rs6000/rs6000-overload.def         |  48 +++++
> > > > >  .../gcc.target/powerpc/future-vucmpr.c        |  60 ++++++
> > > > >  .../gcc.target/powerpc/future-vupk.c          |  48 +++++
> > > > >  5 files changed, 376 insertions(+)
> > > > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/future-
> > > > > vucmpr.c
> > > > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/future-
> > > > > vupk.c
> > > > > 
> > > > > diff --git a/gcc/config/rs6000/altivec.md
> > > > > b/gcc/config/rs6000/altivec.md
> > > > > index 129f56245cd..e40ed7b442d 100644
> > > > > --- a/gcc/config/rs6000/altivec.md
> > > > > +++ b/gcc/config/rs6000/altivec.md
> > > > > @@ -171,6 +171,16 @@
> > > > >     UNSPEC_SLDB
> > > > >     UNSPEC_SRDB
> > > > >     UNSPEC_VECTOR_SHIFT
> > > > > +   UNSPEC_VUCMPRHN
> > > > > +   UNSPEC_VUCMPRLN
> > > > > +   UNSPEC_VUCMPRHB
> > > > > +   UNSPEC_VUCMPRLB
> > > > > +   UNSPEC_VUCMPRHH
> > > > > +   UNSPEC_VUCMPRLH
> > > > > +   UNSPEC_VUPKINT4TOBF16
> > > > > +   UNSPEC_VUPKINT8TOBF16
> > > > > +   UNSPEC_VUPKINT4TOFP32
> > > > > +   UNSPEC_VUPKINT8TOFP32
> > > > >  ])
> > > > >  
> > > > >  (define_c_enum "unspecv"
> > > > > @@ -4826,3 +4836,175 @@
> > > > >                                 (match_dup 3)]
> > > > >                                UNSPEC_BCD_ADD_SUB)
> > > > >                   (match_dup 4)))])])
> > > > > +
> > > > > +;; Vector unpack instructions for future.
> > > > > +
> > > > > +(define_insn "altivec_vupkhsntob"
> > > > > +  [(set (match_operand:V16QI 0 "register_operand" "=v")
> > > > > +        (unspec:V16QI [(match_operand:V2DI 1
> > > > > "register_operand"
> > > > > "v")]
> > > > > +                        UNSPEC_VUNPACK_HI_SIGN))]
> > > > > +  "TARGET_FUTURE"
> > > > > +{
> > > > > +  if (BYTES_BIG_ENDIAN)
> > > > > +    return "vupkhsntob %0, %1";
> > > > > +  else
> > > > > +    return "vupklsntob %0, %1";
> > > > Had a doubt here in the little endian case. In the prior unpack
> > > > patterns, we see that output is roughly double of input. For
> > > > example if
> > > > input is V16QI then output was V8HI. In that case, I think it
> > > > produces
> > > > correct results in the register.
> > > > But here since the input is V2DI and output is V16QI, the
> > > > results
> > > > may
> > > > not be correct, and may need compensation instruction to match.
> > > 
> > > What is logical view and physical view? Do you mean memory view
> > > and
> > > register view, respectively?
> > 
> > No this is not what I meant.
> > 
> > It is the same view of the register but viewed in 2 different ways.
> > By
> > logical view I mean what is visible to programmer like in debugger.
> > Suppose programmer loads value with a simple load like lxv  0x 00
> > 01 02
> > 03 04 05 06 07 00 ... in register vsx register vs1 whose type is
> > 'vector char'. The logical view will be the same in le and be.
> > But physically the order of elements in le will be reversed i.e.,
> > le
> > will have 
> > [00 00 00 00 00 00 00 00 07 06 05 04 03 02 01 00]
> > but be would have
> > [00 01 02 03 04 05 06 07 00 00 00 00 00 00 00 00]
> 
> If the input register has [xx xx xx xx xx xx xx xx | 01 02 03 04 05
> 06 07 08],

Will appear to programmer in le as 
[01 02 03 04 05 06 07 08 | xx xx xx xx xx xx xx xx]

> then vupklsntob produces:
> [00 01 02 03 04 05 06 07 00 00 00 00 00 00 00 00]

If I read the RFC in OPF correctly, I think it would produce
[00 01 00 02 00 03 00 04 00 05 00 06 00 07 00 08]
as each nibble is sign-extended to a byte.

And this will be actual contents of the register, programmer in little
endian system would see 1st element(0th index) of this byte array as 08
in instead of seeing 00. Thus the register will have incorrect value.
With current implementation, results observed in le and be would likely
be different.

Main problem comes from treating the 16 element array of nibbles as
doubleword. Order of nibbles within the doubleword will be big endian
even in le system.



> 
> Note that this is a vector of bytes and this is not a large 128bit
> number.
> The debugger should treat it appropriately.
> 
> > 
> > > 
> > > > E.g.,
> > > > Input register
> > > > v1 = [01 02 03 04 05 06 07 08 | xx xx xx xx xx xx xx xx]
> > > > (logical
> > > > view)
> > > > [xx xx xx xx xx xx xx xx | 01 02 03 04 05 06 07 08] (physical
> > > > view)
> > > > Note the order of elements are flipped in le.
> > > > 
> > > > output register
> > > > expected logical view
> > > > [00 01 00 02 00 03 00 04 00 05 00 06 00 07 00 08]
> > > > expected physical view
> > > > [08 00 07 00 06 00 05 00 04 00 03 00 02 00 01 00]
> > > > 
> > > > But if we just run vupklsntob, what we would get is
> > > > actual physical view
> > > > [00 01 00 02 00 03 00 04 00 05 00 06 00 07 00 08]
> > > > actual logical view
> > > > [08 00 07 00 06 00 05 00 04 00 03 00 02 00 01 00]
> > > > which would be the reverse of what we are expecting
> > > 
> > > I would expect that a vector of bytes would be stored back into
> > > memory
> > > using stxvb16x instruction.
> > > 
> > Right, but even then I think results will be wrong if a different
> > operation uses this result.
> 
> As long as the register/memory has the correct value, I don't see any
> issue here if a subsequent instruction uses this value.

My point is that in le, the register itself would have incorrect value.

-Avinash

> 
> -Surya
> 
> > 
> > Thanks,
> > Avinash
> > 
> > > -Surya
> > > 
> > > > 
> > > > Please let me know if this analysis is wrong somewhere. If not,
> > > > then I
> > > > believe we need to add instructions to get the correct result.
> > > > 
> > > > Likewise for other patterns.
> > > > 
> > > > > +}
> > > > > +  [(set_attr "type" "vecperm")])
> > > > > +
> > > > > +(define_insn "altivec_vupklsntob"
> > > > > +  [(set (match_operand:V16QI 0 "register_operand" "=v")
> > > > > +        (unspec:V16QI [(match_operand:V2DI 1
> > > > > "register_operand"
> > > > > "v")]
> > > > > +                        UNSPEC_VUNPACK_LO_SIGN))]
> > > > > +  "TARGET_FUTURE"
> > > > > +{
> > > > > +  if (BYTES_BIG_ENDIAN)
> > > > > +    return "vupklsntob %0, %1";
> > > > > +  else
> > > > > +    return "vupkhsntob %0, %1";
> > > > > +}
> > > > > +  [(set_attr "type" "vecperm")])
> > > > > +
> > > > > +(define_insn "altivec_vupkint4tobf16"
> > > > > +  [(set (match_operand:V8HI 0 "register_operand" "=v")
> > > > > +        (unspec:V8HI [(match_operand:V8HI 1
> > > > > "register_operand"
> > > > > "v")
> > > > > +                      (match_operand:QI 2
> > > > > "const_0_to_3_operand"
> > > > > "i")]
> > > > > +                      UNSPEC_VUPKINT4TOBF16))]
> > > > > +  "TARGET_FUTURE"
> > > > > +  "vupkint4tobf16 %0, %1, %2"
> > > > > +  [(set_attr "type" "vecperm")])
> > > > > +
> > > > > +(define_insn "altivec_vupkint8tobf16"
> > > > > +  [(set (match_operand:V8HI 0 "register_operand" "=v")
> > > > > +        (unspec:V8HI [(match_operand:V8HI 1
> > > > > "register_operand"
> > > > > "v")
> > > > > +                      (match_operand:QI 2
> > > > > "const_0_to_1_operand"
> > > > > "i")]
> > > > > +                      UNSPEC_VUPKINT8TOBF16))]
> > > > > +  "TARGET_FUTURE"
> > > > > +  "vupkint8tobf16 %0, %1, %2"
> > > > > +  [(set_attr "type" "vecperm")])
> > > > > +
> > > > > +(define_insn "altivec_vupkint4tofp32"
> > > > > +  [(set (match_operand:V4SI 0 "register_operand" "=v")
> > > > > +        (unspec:V4SI [(match_operand:V4SI 1
> > > > > "register_operand"
> > > > > "v")
> > > > > +                      (match_operand:QI 2
> > > > > "const_0_to_7_operand"
> > > > > "i")]
> > > > > +                      UNSPEC_VUPKINT4TOFP32))]
> > > > > +  "TARGET_FUTURE"
> > > > > +  "vupkint4tofp32 %0, %1, %2"
> > > > > +  [(set_attr "type" "vecperm")])
> > > > > +
> > > > > +(define_insn "altivec_vupkint8tofp32"
> > > > > +  [(set (match_operand:V4SI 0 "register_operand" "=v")
> > > > Can we use V4SF instead?
> > > > > +        (unspec:V4SI [(match_operand:V4SI 1
> > > > > "register_operand"
> > > > > "v")
> > > > > +                      (match_operand:QI 2
> > > > > "const_0_to_3_operand"
> > > > > "i")]
> > > > > +                      UNSPEC_VUPKINT8TOFP32))]
> > > > > +  "TARGET_FUTURE"
> > > > > +  "vupkint8tofp32 %0, %1, %2"
> > > > > +  [(set_attr "type" "vecperm")])
> > > > > +
> > > > > +(define_int_attr vu_hl [(UNSPEC_VUCMPRHN "h")
> > > > > (UNSPEC_VUCMPRLN
> > > > > "l")
> > > > > +                        (UNSPEC_VUCMPRHB "h")
> > > > > (UNSPEC_VUCMPRLB
> > > > > "l")
> > > > > +                        (UNSPEC_VUCMPRHH "h")
> > > > > (UNSPEC_VUCMPRLH
> > > > > "l")])
> > > > > +
> > > > > +(define_int_attr vu_lh [(UNSPEC_VUCMPRHN "l")
> > > > > (UNSPEC_VUCMPRLN
> > > > > "h")
> > > > > +                        (UNSPEC_VUCMPRHB "l")
> > > > > (UNSPEC_VUCMPRLB
> > > > > "h")
> > > > > +                        (UNSPEC_VUCMPRHH "l")
> > > > > (UNSPEC_VUCMPRLH
> > > > > "h")])
> > > > > +
> > > > > +(define_int_attr vucmpr_splt_val [(UNSPEC_VUCMPRHN "3")
> > > > > (UNSPEC_VUCMPRLN "2")
> > > > > +                                  (UNSPEC_VUCMPRHB "7")
> > > > > (UNSPEC_VUCMPRLB "6")
> > > > > +                                  (UNSPEC_VUCMPRHH "15")
> > > > > (UNSPEC_VUCMPRLH "14")])
> > > > > +
> > > > > +;; Vector uncompress instructions for future.
> > > > > +
> > > > > +;; Vector Uncompress Nibbles
> > > > > +
> > > > > +(define_int_iterator VUCMPR_N [UNSPEC_VUCMPRHN
> > > > > UNSPEC_VUCMPRLN])
> > > > > +
> > > > > +(define_expand "altivec_vucmpr<vu_hl>n"
> > > > > +  [(set (match_operand:V8HI 0 "register_operand" "=v")
> > > > > +        (unspec:V8HI [(match_operand:V16QI 1
> > > > > "register_operand"
> > > > > "v")
> > > > > +                       (match_operand:V4SI 2
> > > > > "register_operand"
> > > > > "v")]
> > > > > +                     VUCMPR_N))]
> > > > > +  "TARGET_FUTURE"
> > > > > +  {
> > > > > +    if (BYTES_BIG_ENDIAN)
> > > > > +      emit_insn (gen_altivec_vucmpr<vu_hl>n_direct
> > > > > (operands[0],
> > > > > operands[1], operands[2]));
> > > > > +    else
> > > > > +      {
> > > > > +        rtx tmp = gen_reg_rtx (V4SImode);
> > > > > +        emit_insn (gen_altivec_vspltw_direct (tmp,
> > > > > operands[2],
> > > > > GEN_INT (<vucmpr_splt_val>)));
> > > > > +        emit_insn (gen_altivec_vucmpr<vu_lh>n_direct
> > > > > (operands[0],
> > > > > operands[1], tmp));
> > > > > +      }
> > > > > +    DONE;
> > > > > +  })
> > > > > +
> > > > > +(define_insn "altivec_vucmpr<vu_hl>n_direct"
> > > > > +  [(set (match_operand:V8HI 0 "register_operand" "=v")
> > > > > +        (unspec:V8HI [(match_operand:V16QI 1
> > > > > "register_operand"
> > > > > "v")
> > > > > +                      (match_operand:V4SI 2
> > > > > "register_operand"
> > > > > "v")]
> > > > > +                     VUCMPR_N))]
> > > > > +  "TARGET_FUTURE"
> > > > > +  "vucmpr<vu_hl>n %0, %1, %2"
> > > > > +  [(set_attr "type" "vecperm")])
> > > > > +
> > > > > +
> > > > > +;; Vector Uncompress Bytes
> > > > > +
> > > > > +(define_int_iterator VUCMPR_B [UNSPEC_VUCMPRHB
> > > > > UNSPEC_VUCMPRLB])
> > > > > +
> > > > > +(define_expand "altivec_vucmpr<vu_hl>b"
> > > > > +  [(set (match_operand:V4SI 0 "register_operand" "=v")
> > > > > +        (unspec:V4SI [(match_operand:V8HI 1
> > > > > "register_operand"
> > > > > "v")
> > > > > +                      (match_operand:V8HI 2
> > > > > "register_operand"
> > > > > "v")]
> > > > > +                     VUCMPR_B))]
> > > > > +  "TARGET_FUTURE"
> > > > > +  {
> > > > > +    if (BYTES_BIG_ENDIAN)
> > > > > +        emit_insn (gen_altivec_vucmpr<vu_hl>b_direct
> > > > > (operands[0],
> > > > > operands[1], operands[2]));
> > > > > +    else
> > > > > +      {
> > > > > +        rtx tmp = gen_reg_rtx (V8HImode);
> > > > > +        emit_insn (gen_altivec_vsplth_direct (tmp,
> > > > > operands[2],
> > > > > GEN_INT (<vucmpr_splt_val>)));
> > > > > +        emit_insn (gen_altivec_vucmpr<vu_lh>b_direct
> > > > > (operands[0],
> > > > > operands[1], tmp));
> > > > > +      }
> > > > > +    DONE;
> > > > > +  })
> > > > > +
> > > > > +(define_insn "altivec_vucmpr<vu_hl>b_direct"
> > > > > +  [(set (match_operand:V4SI 0 "register_operand" "=v")
> > > > > +        (unspec:V4SI [(match_operand:V8HI 1
> > > > > "register_operand"
> > > > > "v")
> > > > > +                      (match_operand:V8HI 2
> > > > > "register_operand"
> > > > > "v")]
> > > > > +                     VUCMPR_B))]
> > > > > +  "TARGET_FUTURE"
> > > > > +  "vucmpr<vu_hl>b %0, %1, %2"
> > > > > +  [(set_attr "type" "vecperm")])
> > > > > +
> > > > > +;; Vector Uncompress Halfwords
> > > > > +
> > > > > +(define_int_iterator VUCMPR_H [UNSPEC_VUCMPRHH
> > > > > UNSPEC_VUCMPRLH])
> > > > > +
> > > > > +(define_expand "altivec_vucmpr<vu_hl>h"
> > > > > +  [(set (match_operand:V2DI 0 "register_operand" "=v")
> > > > > +        (unspec:V2DI [(match_operand:V4SI 1
> > > > > "register_operand"
> > > > > "v")
> > > > > +                      (match_operand:V16QI 2
> > > > > "register_operand"
> > > > > "v")]
> > > > > +                     VUCMPR_H))]
> > > > > +  "TARGET_FUTURE"
> > > > > +  {
> > > > > +    if (BYTES_BIG_ENDIAN)
> > > > > +        emit_insn (gen_altivec_vucmpr<vu_hl>h_direct
> > > > > (operands[0],
> > > > > operands[1], operands[2]));
> > > > > +    else
> > > > > +      {
> > > > > +        rtx tmp = gen_reg_rtx (V16QImode);
> > > > > +        emit_insn (gen_altivec_vspltb_direct (tmp,
> > > > > operands[2],
> > > > > GEN_INT (<vucmpr_splt_val>)));
> > > > > +        emit_insn (gen_altivec_vucmpr<vu_lh>h_direct
> > > > > (operands[0],
> > > > > operands[1], tmp));
> > > > > +      }
> > > > > +    DONE;
> > > > > +  })
> > > > > +
> > > > > +(define_insn "altivec_vucmpr<vu_hl>h_direct"
> > > > > +  [(set (match_operand:V2DI 0 "register_operand" "=v")
> > > > > +        (unspec:V2DI [(match_operand:V4SI 1
> > > > > "register_operand"
> > > > > "v")
> > > > > +                      (match_operand:V16QI 2
> > > > > "register_operand"
> > > > > "v")]
> > > > > +                     VUCMPR_H))]
> > > > > +  "TARGET_FUTURE"
> > > > > +  "vucmpr<vu_hl>h %0, %1, %2"
> > > > > +  [(set_attr "type" "vecperm")])
> > > > > diff --git a/gcc/config/rs6000/rs6000-builtins.def
> > > > > b/gcc/config/rs6000/rs6000-builtins.def
> > > > > index 7e5a4fb96e7..7ade43098f9 100644
> > > > > --- a/gcc/config/rs6000/rs6000-builtins.def
> > > > > +++ b/gcc/config/rs6000/rs6000-builtins.def
> > > > > @@ -3924,3 +3924,41 @@
> > > > >  
> > > > >    void __builtin_vsx_stxvp (v256, unsigned long, const v256
> > > > > *);
> > > > >      STXVP nothing {mma,pair}
> > > > > +
> > > > > +
> > > > > +[future]
> > > > > +  const vus __builtin_altivec_uncompresshn (vuc, vui);
> > > > > +    VUCMPRHN altivec_vucmprhn {}
> > > > > +
> > > > > +  const vui __builtin_altivec_uncompresshb (vus, vus);
> > > > > +    VUCMPRHB altivec_vucmprhb {}
> > > > > +
> > > > > +  const vull __builtin_altivec_uncompresshh (vui, vuc);
> > > > > +    VUCMPRHH altivec_vucmprhh {}
> > > > > +
> > > > > +  const vus __builtin_altivec_uncompressln (vuc, vui);
> > > > > +    VUCMPRLN altivec_vucmprln {}
> > > > > +
> > > > > +  const vui __builtin_altivec_uncompresslb (vus, vus);
> > > > > +    VUCMPRLB altivec_vucmprlb {}
> > > > > +
> > > > > +  const vull __builtin_altivec_uncompresslh (vui, vuc);
> > > > > +    VUCMPRLH altivec_vucmprlh {}
> > > > > +
> > > > > +  const vsc __builtin_altivec_unpack_hsn_to_byte (vull);
> > > > > +    VUPKHSNTOB altivec_vupkhsntob {}
> > > > > +
> > > > > +  const vsc __builtin_altivec_unpack_lsn_to_byte (vull);
> > > > > +    VUPKLSNTOB altivec_vupklsntob {}
> > > > > +
> > > > > +  const vuc __builtin_altivec_unpack_int4_to_bf16 (vus,
> > > > > const
> > > > > int<2>);
> > > > > +    VUPKINT4TOBF16 altivec_vupkint4tobf16 {}
> > > > > +
> > > > > +  const vuc __builtin_altivec_unpack_int8_to_bf16 (vus,
> > > > > const
> > > > > int<1>);
> > > > > +    VUPKINT8TOBF16 altivec_vupkint8tobf16 {}
> > > > > +
> > > > > +  const vf __builtin_altivec_unpack_int4_to_fp32 (vui, const
> > > > > int<3>);
> > > > > +    VUPKINT4TOFP32 altivec_vupkint4tofp32 {}
> > > > > +
> > > > > +  const vf __builtin_altivec_unpack_int8_to_fp32 (vui, const
> > > > > int<2>);
> > > > > +    VUPKINT8TOFP32 altivec_vupkint8tofp32 {}
> > > > > diff --git a/gcc/config/rs6000/rs6000-overload.def
> > > > > b/gcc/config/rs6000/rs6000-overload.def
> > > > > index 5238c81b214..532e9c7a68a 100644
> > > > > --- a/gcc/config/rs6000/rs6000-overload.def
> > > > > +++ b/gcc/config/rs6000/rs6000-overload.def
> > > > > @@ -5015,6 +5015,54 @@
> > > > >    vd __builtin_vsx_xxsldwi (vd, vd, const int);
> > > > >      XXSLDWI_2DF  XXSLDWI_VD2
> > > > >  
> > > > > +[VEC_UCMPRHN, vec_uncompresshn, __builtin_vec_uncompresshn]
> > > > > +  vus __builtin_vec_uncompresshn (vuc, vui);
> > > > > +    VUCMPRHN
> > > > > +
> > > > > +[VEC_UCMPRHB, vec_uncompresshb, __builtin_vec_uncomresshb]
> > > > > +  vui __builtin_vec_uncomresshb (vus, vus);
> > > > > +    VUCMPRHB
> > > > > +
> > > > > +[VEC_UCMPRHH, vec_uncompresshh, __builtin_vec_uncomresshh]
> > > > > +  vull __builtin_vec_uncomresshh (vui, vuc);
> > > > > +    VUCMPRHH
> > > > > +
> > > > > +[VEC_UCMPRLN, vec_uncompressln, __builtin_vec_uncomressln]
> > > > > +  vus __builtin_vec_uncomressln (vuc, vui);
> > > > > +    VUCMPRLN
> > > > > +
> > > > > +[VEC_UCMPRLB, vec_uncompresslb, __builtin_vec_uncomresslb]
> > > > > +  vui __builtin_vec_uncomresslb (vus, vus);
> > > > > +    VUCMPRLB
> > > > > +
> > > > > +[VEC_UCMPRLH, vec_uncompresslh, __builtin_vec_uncomresslh]
> > > > > +  vull __builtin_vec_uncomresslh (vui, vuc);
> > > > > +    VUCMPRLH
> > > > > +
> > > > > +[VEC_UNPACK_HSN_TO_BYTE, vec_unpack_hsn_to_byte,
> > > > > __builtin_vec_unpack_hsn_to_byte]
> > > > > +  vsc __builtin_vec_unpack_hsn_to_byte (vull);
> > > > > +    VUPKHSNTOB
> > > > > +
> > > > > +[VEC_UNPACK_LSN_TO_BYTE, vec_unpack_lsn_to_byte,
> > > > > __builtin_vec_unpack_lsn_to_byte]
> > > > > +  vsc __builtin_vec_unpack_lsn_to_byte (vull);
> > > > > +    VUPKLSNTOB
> > > > > +
> > > > > +[VEC_UNPACK_INT4_TO_BF16, vec_unpack_int4_to_bf16,
> > > > > __builtin_vec_unpack_int4_to_bf16]
> > > > > +  vuc __builtin_vec_unpack_int4_to_bf16 (vus, const int<2>);
> > > > > +    VUPKINT4TOBF16
> > > > > +
> > > > > +[VEC_UNPACK_INT8_TO_BF16, vec_unpack_int8_to_bf16,
> > > > > __builtin_vec_unpack_int8_to_bf16]
> > > > > +  vuc __builtin_vec_unpack_int8_to_bf16 (vus, const int<1>);
> > > > > +    VUPKINT8TOBF16
> > > > > +
> > > > > +[VEC_UNPACK_INT4_TO_FP32, vec_unpack_int4_to_fp32,
> > > > > __builtin_vec_unpack_int4_to_fp32]
> > > > > +  vf __builtin_vec_unpack_int4_to_fp32 (vui, const int<3>);
> > > > > +    VUPKINT4TOFP32
> > > > > +
> > > > > +[VEC_UNPACK_INT8_TO_FP32, vec_unpack_int8_to_fp32,
> > > > > __builtin_vec_unpack_int8_to_fp32]
> > > > > +  vf __builtin_vec_unpack_int8_to_fp32 (vui, const int<2>);
> > > > > +    VUPKINT8TOFP32
> > > > > +
> > > > >  
> > > > >  ;
> > > > > *************************************************************
> > > > > ****
> > > > > ****
> > > > > *****
> > > > >  ;
> > > > > *************************************************************
> > > > > ****
> > > > > ****
> > > > > *****
> > > > > diff --git a/gcc/testsuite/gcc.target/powerpc/future-vucmpr.c
> > > > > b/gcc/testsuite/gcc.target/powerpc/future-vucmpr.c
> > > > > new file mode 100644
> > > > > index 00000000000..58ffa67ebb1
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/powerpc/future-vucmpr.c
> > > > > @@ -0,0 +1,60 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2 -mdejagnu-cpu=future" } */
> > > > > +
> > > > > +#include <altivec.h>
> > > > > +
> > > > > +vector unsigned short test_uncompresshn(vector unsigned char
> > > > > a,
> > > > > +                                        vector unsigned int
> > > > > b)
> > > > > +{
> > > > > +  return vec_uncompresshn(a, b);
> > > > > +}
> > > > > +
> > > > > +vector unsigned int test_uncompresshb(vector unsigned short
> > > > > a,
> > > > > +                                      vector unsigned short
> > > > > b)
> > > > > +{
> > > > > +  return vec_uncompresshb(a, b);
> > > > > +}
> > > > > +
> > > > > +vector unsigned long long test_uncompresshh(vector unsigned
> > > > > int
> > > > > a,
> > > > > +                                            vector unsigned
> > > > > char
> > > > > b)
> > > > > +{
> > > > > +  return vec_uncompresshh(a, b);
> > > > > +}
> > > > > +
> > > > > +vector unsigned short test_uncompressln(vector unsigned char
> > > > > a,
> > > > > +                                        vector unsigned int
> > > > > b)
> > > > > +{
> > > > > +  return vec_uncompressln(a, b);
> > > > > +}
> > > > > +
> > > > > +vector unsigned int test_uncompresslb(vector unsigned short
> > > > > a,
> > > > > +                                      vector unsigned short
> > > > > b)
> > > > > +{
> > > > > +  return vec_uncompresslb(a, b);
> > > > > +}
> > > > > +
> > > > > +vector unsigned long long test_uncompresslh(vector unsigned
> > > > > int
> > > > > a,
> > > > > +                                            vector unsigned
> > > > > char
> > > > > b)
> > > > > +{
> > > > > +  return vec_uncompresslh(a, b);
> > > > > +}
> > > > > +
> > > > > +/* BE: direct instructions, no splats */
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not "vspltw" { target { be }
> > > > > } }
> > > > > } */
> > > > > +/* { dg-final { scan-assembler-not "vsplth" { target { be }
> > > > > } }
> > > > > } */
> > > > > +/* { dg-final { scan-assembler-not "vspltb" { target { be }
> > > > > } }
> > > > > } */
> > > > > +
> > > > > +/* LE: splats must appear */
> > > > > +
> > > > > +/* { dg-final { scan-assembler-times "vspltw" 2 { target {
> > > > > le }
> > > > > } }
> > > > > } */
> > > > > +/* { dg-final { scan-assembler-times "vsplth" 2 { target {
> > > > > le }
> > > > > } }
> > > > > } */
> > > > > +/* { dg-final { scan-assembler-times "vspltb" 2 { target {
> > > > > le }
> > > > > } }
> > > > > } */
> > > > > +
> > > > > +/* { dg-final { scan-assembler-times "vucmprln" 1 } } */
> > > > > +/* { dg-final { scan-assembler-times "vucmprlb" 1 } } */
> > > > > +/* { dg-final { scan-assembler-times "vucmprlh" 1 } } */
> > > > > +
> > > > > +/* { dg-final { scan-assembler-times "vucmprhn" 1 } } */
> > > > > +/* { dg-final { scan-assembler-times "vucmprhb" 1 } } */
> > > > > +/* { dg-final { scan-assembler-times "vucmprhh" 1 } } */
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.target/powerpc/future-vupk.c
> > > > > b/gcc/testsuite/gcc.target/powerpc/future-vupk.c
> > > > > new file mode 100644
> > > > > index 00000000000..fa4876dd7eb
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/powerpc/future-vupk.c
> > > > > @@ -0,0 +1,48 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2 -mdejagnu-cpu=future" } */
> > > > > +
> > > > > +#include <altivec.h>
> > > > > +
> > > > > +vector signed char
> > > > > +test_unpack_hsn_to_byte(vector unsigned long long a)
> > > > > +{
> > > > > +  return vec_unpack_hsn_to_byte(a);
> > > > > +}
> > > > > +
> > > > > +vector signed char
> > > > > +test_unpack_lsn_to_byte(vector unsigned long long a)
> > > > > +{
> > > > > +  return vec_unpack_lsn_to_byte(a);
> > > > > +}
> > > > > +
> > > > > +vector unsigned char
> > > > > +test_unpack_int4_to_bf16(vector unsigned short a)
> > > > > +{
> > > > > +  return vec_unpack_int4_to_bf16(a, 0);
> > > > > +}
> > > > > +
> > > > > +vector unsigned char
> > > > > +test_unpack_int8_to_bf16(vector unsigned short a)
> > > > > +{
> > > > > +  return vec_unpack_int8_to_bf16(a, 0);
> > > > > +}
> > > > > +
> > > > > +vector float
> > > > > +test_unpack_int4_to_fp32(vector unsigned int a)
> > > > > +{
> > > > > +  return vec_unpack_int4_to_fp32(a, 0);
> > > > > +}
> > > > > +
> > > > > +vector float
> > > > > +test_unpack_int8_to_fp32(vector unsigned int a)
> > > > > +{
> > > > > +  return vec_unpack_int8_to_fp32(a, 0);
> > > > > +}
> > > > > +
> > > > > +
> > > > > +/* { dg-final { scan-assembler-times "vupkhsntob" 1  } } */
> > > > > +/* { dg-final { scan-assembler-times "vupklsntob" 1  } } */
> > > > > +/* { dg-final { scan-assembler-times "vupkint4tobf16" 1 } }
> > > > > */
> > > > > +/* { dg-final { scan-assembler-times "vupkint8tobf16" 1 } }
> > > > > */
> > > > > +/* { dg-final { scan-assembler-times "vupkint4tofp32" 1 } }
> > > > > */
> > > > > +/* { dg-final { scan-assembler-times "vupkint8tofp32" 1 } }
> > > > > */
> > > > > \ No newline at end of file
> > > > 
> > > > Along with the builtin tests, it would be good to have run time
> > > > test
> > > > cases to check the actual result. I am not sure if qemu can be
> > > > used
> > > > for
> > > > that purpose, but having those tests would make sure code
> > > > generated
> > > > for
> > > > both le and be is functionally correct at least in the future,
> > > > as
> > > > there
> > > > were some endian related similar bugs like PR119130.
> > > > 
> > > > Thanks and regards,
> > > > Avinash Jayakar

Reply via email to