On 2022/8/16 14:53, Kewen.Lin wrote:
Hi Xionghu,

Thanks for the updated version of patch, some comments are inlined.

on 2022/8/11 14:15, Xionghu Luo wrote:


On 2022/8/11 01:07, Segher Boessenkool wrote:
On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote:
On 2022/8/9 11:01, Kewen.Lin wrote:
I have some concern on those changed "altivec_*_direct", IMHO the suffix
"_direct" is normally to indicate the define_insn is mapped to the
corresponding hw insn directly.  With this change, for example,
altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
misleading.  Maybe we can add the corresponding _direct_le and _direct_be
versions, both are mapped into the same insn but have different RTL
patterns.  Looking forward to Segher's and David's suggestions.

Thanks!  Do you mean same RTL patterns with different hw insn?

A pattern called altivec_vmrghb_direct_le should always emit a vmrghb
instruction, never a vmrglb instead.  Misleading names are an expensive
problem.



Thanks.  Then on LE platforms, if user calls altivec_vmrghw,it will be
expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and
finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw".
For BE just strict forward, seems more clear :-), OK for master?


[PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS 
[PR106069]

v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
patterns.
v2: Split the direct pattern to be and le with same RTL but different insn.

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
            (subreg:V4SI (reg:V16QI 139) 0)
            (subreg:V4SI (reg:V16QI 140) 0))
            [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64}
Linux(Thanks to Kewen).

gcc/ChangeLog:

     PR target/106069
     * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
     (altivec_vmrghb_direct_be): New pattern for BE.
     (altivec_vmrglb_direct_le): New pattern for LE.
     (altivec_vmrghh_direct): Remove.
     (altivec_vmrghh_direct_be): New pattern for BE.
     (altivec_vmrglh_direct_le): New pattern for LE.
     (altivec_vmrghw_direct_<mode>): Remove.
     (altivec_vmrghw_direct_<mode>_be): New pattern for BE.
     (altivec_vmrglw_direct_<mode>_le): New pattern for LE.
     (altivec_vmrglb_direct): Remove.
     (altivec_vmrglb_direct_be): New pattern for BE.
     (altivec_vmrghb_direct_le): New pattern for LE.
     (altivec_vmrglh_direct): Remove.
     (altivec_vmrglh_direct_be): New pattern for BE.
     (altivec_vmrghh_direct_le): New pattern for LE.
     (altivec_vmrglw_direct_<mode>): Remove.
     (altivec_vmrglw_direct_<mode>_be): New pattern for BE.
     (altivec_vmrghw_direct_<mode>_le): New pattern for LE.
     * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
     Adjust.
     * config/rs6000/vsx.md: Likewise.

gcc/testsuite/ChangeLog:

     PR target/106069
     * g++.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo <xionghu...@tencent.com>
---
  gcc/config/rs6000/altivec.md                | 223 ++++++++++++++------
  gcc/config/rs6000/rs6000.cc                 |  36 ++--
  gcc/config/rs6000/vsx.md                    |  24 +--
  gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++++++++++
  4 files changed, 305 insertions(+), 98 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 2c4940f2e21..78245f470e9 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb"
     (use (match_operand:V16QI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-                        : gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT 
(17),
+              GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19),
+              GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21),
+              GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23));
+  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]);
+  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })

I think you can just call gen_altivec_vmrghb_direct_be and
gen_altivec_vmrghb_direct_le separately here.  Similar for some other
define_expands.

-(define_insn "altivec_vmrghb_direct"
+(define_insn "altivec_vmrghb_direct_be"
    [(set (match_operand:V16QI 0 "register_operand" "=v")
      (vec_select:V16QI
        (vec_concat:V32QI
@@ -1166,27 +1168,46 @@ (define_insn "altivec_vmrghb_direct"
               (const_int 5) (const_int 21)
               (const_int 6) (const_int 22)
               (const_int 7) (const_int 23)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
    "vmrghb %0,%1,%2"
    [(set_attr "type" "vecperm")])

Could you move the following altivec_vmrghb_direct_le here?
Then readers can easily check the difference between be and
le for the same altivec_vmrghb_direct.

Same comment applied for some other similar cases.

+(define_insn "altivec_vmrglb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+    (vec_select:V16QI
+      (vec_concat:V32QI
+        (match_operand:V16QI 1 "register_operand" "v")
+        (match_operand:V16QI 2 "register_operand" "v"))
+      (parallel [(const_int 0) (const_int 16)
+             (const_int 1) (const_int 17)
+             (const_int 2) (const_int 18)
+             (const_int 3) (const_int 19)
+             (const_int 4) (const_int 20)
+             (const_int 5) (const_int 21)
+             (const_int 6) (const_int 22)
+             (const_int 7) (const_int 23)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
+  "vmrglb %0,%2,%1"
+  [(set_attr "type" "vecperm")])

Could you update this pattern for assembly "vmrglb %0,%1,%2"
instead of "vmrglb %0,%2,%1"?  I checked the previous md
before the culprit commit 0910c516a3d72af048, it emits
"vmrglb %0,%1,%2" for altivec_vmrglb_direct.

Same comment applied for some other similar cases.

+
  (define_expand "altivec_vmrghh"
    [(use (match_operand:V8HI 0 "register_operand"))
     (use (match_operand:V8HI 1 "register_operand"))
     (use (match_operand:V8HI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
-                        : gen_altivec_vmrglh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9),
+              GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11));
+  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]);
+
+  x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })
-(define_insn "altivec_vmrghh_direct"
+(define_insn "altivec_vmrghh_direct_be"
    [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (vec_select:V8HI
+    (vec_select:V8HI
        (vec_concat:V16HI
          (match_operand:V8HI 1 "register_operand" "v")
          (match_operand:V8HI 2 "register_operand" "v"))
@@ -1194,26 +1215,38 @@ (define_insn "altivec_vmrghh_direct"
               (const_int 1) (const_int 9)
               (const_int 2) (const_int 10)
               (const_int 3) (const_int 11)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
    "vmrghh %0,%1,%2"
    [(set_attr "type" "vecperm")])
+(define_insn "altivec_vmrglh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+    (vec_select:V8HI
+      (vec_concat:V16HI
+        (match_operand:V8HI 1 "register_operand" "v")
+        (match_operand:V8HI 2 "register_operand" "v"))
+      (parallel [(const_int 0) (const_int 8)
+             (const_int 1) (const_int 9)
+             (const_int 2) (const_int 10)
+             (const_int 3) (const_int 11)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
+  "vmrglh %0,%2,%1"
+  [(set_attr "type" "vecperm")])
+
  (define_expand "altivec_vmrghw"
    [(use (match_operand:V4SI 0 "register_operand"))
     (use (match_operand:V4SI 1 "register_operand"))
     (use (match_operand:V4SI 2 "register_operand"))]
    "VECTOR_MEM_ALTIVEC_P (V4SImode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
-             : gen_altivec_vmrglw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (4, GEN_INT (0), GEN_INT (4), GEN_INT (1), GEN_INT (5));
+  rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]);
+  x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })
-(define_insn "altivec_vmrghw_direct_<mode>"
+(define_insn "altivec_vmrghw_direct_<mode>_be"
    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
      (vec_select:VSX_W
        (vec_concat:<VS_double>
@@ -1221,10 +1254,24 @@ (define_insn "altivec_vmrghw_direct_<mode>"
          (match_operand:VSX_W 2 "register_operand" "wa,v"))
        (parallel [(const_int 0) (const_int 4)
               (const_int 1) (const_int 5)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+  xxmrghw %x0,%x1,%x2
+  vmrghw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+    (vec_select:VSX_W
+      (vec_concat:<VS_double>
+        (match_operand:VSX_W 1 "register_operand" "wa,v")
+        (match_operand:VSX_W 2 "register_operand" "wa,v"))
+      (parallel [(const_int 0) (const_int 4)
+             (const_int 1) (const_int 5)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "@
-   xxmrghw %x0,%x1,%x2
-   vmrghw %0,%1,%2"
+  xxmrglw %x0,%x2,%x1
+  vmrglw %0,%2,%1"
    [(set_attr "type" "vecperm")])
 (define_insn "*altivec_vmrghsf"
@@ -1250,15 +1297,17 @@ (define_expand "altivec_vmrglb"
     (use (match_operand:V16QI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
-                        : gen_altivec_vmrghb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (16, GEN_INT (8), GEN_INT (24), GEN_INT (9), GEN_INT 
(25),
+              GEN_INT (10), GEN_INT (26), GEN_INT (11), GEN_INT (27),
+              GEN_INT (12), GEN_INT (28), GEN_INT (13), GEN_INT (29),
+              GEN_INT (14), GEN_INT (30), GEN_INT (15), GEN_INT (31));
+  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]);
+  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })
-(define_insn "altivec_vmrglb_direct"
+(define_insn "altivec_vmrglb_direct_be"
    [(set (match_operand:V16QI 0 "register_operand" "=v")
      (vec_select:V16QI
        (vec_concat:V32QI
@@ -1272,25 +1321,43 @@ (define_insn "altivec_vmrglb_direct"
               (const_int 13) (const_int 29)
               (const_int 14) (const_int 30)
               (const_int 15) (const_int 31)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
    "vmrglb %0,%1,%2"
    [(set_attr "type" "vecperm")])
+(define_insn "altivec_vmrghb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+    (vec_select:V16QI
+      (vec_concat:V32QI
+        (match_operand:V16QI 1 "register_operand" "v")
+        (match_operand:V16QI 2 "register_operand" "v"))
+      (parallel [(const_int  8) (const_int 24)
+             (const_int  9) (const_int 25)
+             (const_int 10) (const_int 26)
+             (const_int 11) (const_int 27)
+             (const_int 12) (const_int 28)
+             (const_int 13) (const_int 29)
+             (const_int 14) (const_int 30)
+             (const_int 15) (const_int 31)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
+  "vmrghb %0,%2,%1"
+  [(set_attr "type" "vecperm")])
+
  (define_expand "altivec_vmrglh"
    [(use (match_operand:V8HI 0 "register_operand"))
     (use (match_operand:V8HI 1 "register_operand"))
     (use (match_operand:V8HI 2 "register_operand"))]
    "TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
-                        : gen_altivec_vmrghh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (8, GEN_INT (4), GEN_INT (12), GEN_INT (5), GEN_INT (13),
+              GEN_INT (6), GEN_INT (14), GEN_INT (7), GEN_INT (15));
+  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]);
+  x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })
-(define_insn "altivec_vmrglh_direct"
+(define_insn "altivec_vmrglh_direct_be"
    [(set (match_operand:V8HI 0 "register_operand" "=v")
          (vec_select:V8HI
        (vec_concat:V16HI
@@ -1300,26 +1367,38 @@ (define_insn "altivec_vmrglh_direct"
               (const_int 5) (const_int 13)
               (const_int 6) (const_int 14)
               (const_int 7) (const_int 15)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
    "vmrglh %0,%1,%2"
    [(set_attr "type" "vecperm")])
+(define_insn "altivec_vmrghh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (vec_select:V8HI
+      (vec_concat:V16HI
+        (match_operand:V8HI 1 "register_operand" "v")
+        (match_operand:V8HI 2 "register_operand" "v"))
+      (parallel [(const_int 4) (const_int 12)
+             (const_int 5) (const_int 13)
+             (const_int 6) (const_int 14)
+             (const_int 7) (const_int 15)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
+  "vmrghh %0,%2,%1"
+  [(set_attr "type" "vecperm")])
+
  (define_expand "altivec_vmrglw"
    [(use (match_operand:V4SI 0 "register_operand"))
     (use (match_operand:V4SI 1 "register_operand"))
     (use (match_operand:V4SI 2 "register_operand"))]
    "VECTOR_MEM_ALTIVEC_P (V4SImode)"
  {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
-             : gen_altivec_vmrghw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (4, GEN_INT (2), GEN_INT (6), GEN_INT (3), GEN_INT (7));
+  rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]);
+  x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v));
+  emit_insn (gen_rtx_SET (operands[0], x));
    DONE;
  })
-(define_insn "altivec_vmrglw_direct_<mode>"
+(define_insn "altivec_vmrglw_direct_<mode>_be"
    [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
      (vec_select:VSX_W
        (vec_concat:<VS_double>
@@ -1327,10 +1406,24 @@ (define_insn "altivec_vmrglw_direct_<mode>"
          (match_operand:VSX_W 2 "register_operand" "wa,v"))
        (parallel [(const_int 2) (const_int 6)
               (const_int 3) (const_int 7)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+  xxmrglw %x0,%x1,%x2
+  vmrglw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+    (vec_select:VSX_W
+      (vec_concat:<VS_double>
+        (match_operand:VSX_W 1 "register_operand" "wa,v")
+        (match_operand:VSX_W 2 "register_operand" "wa,v"))
+      (parallel [(const_int 2) (const_int 6)
+             (const_int 3) (const_int 7)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
    "@
-   xxmrglw %x0,%x1,%x2
-   vmrglw %0,%1,%2"
+  xxmrghw %x0,%x2,%x1
+  vmrghw %0,%2,%1"
    [(set_attr "type" "vecperm")])
 (define_insn "*altivec_vmrglsf"
@@ -3699,13 +3792,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
      {
        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo));

Note that if you change assembly "vmrghh %0,%2,%1" to "vmrghh %0,%1,%2",
you need to change this to:

   emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));

Same comment applied for some other similar cases.

      }
    DONE;
  })
@@ -3724,13 +3817,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
      {
        emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3749,13 +3842,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
      {
        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3774,13 +3867,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
      {
        emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3799,13 +3892,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
      {
        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3824,13 +3917,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
      {
        emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3849,13 +3942,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
      {
        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo));
      }
    DONE;
  })
@@ -3874,13 +3967,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
      {
        emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
      }
    else
      {
        emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
        emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo));
      }
    DONE;
  })
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index df491bee2ea..97da7706f63 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
rtx op1,
      {OPTION_MASK_ALTIVEC,
       CODE_FOR_altivec_vpkuwum_direct,
       {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
-              : CODE_FOR_altivec_vmrglb_direct,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct_be,
       {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},

Before the culprit commit 0910c516a3d72af04, we have:

     { OPTION_MASK_ALTIVEC,
       (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
        : CODE_FOR_altivec_vmrglb_direct),
       {  0, 16,  1, 17,  2, 18,  3, 19,  4, 20,  5, 21,  6, 22,  7, 23 } },

I think we should use:

     { OPTION_MASK_ALTIVEC,
       (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be
        : CODE_FOR_altivec_vmrglb_direct_le),
       {  0, 16,  1, 17,  2, 18,  3, 19,  4, 20,  5, 21,  6, 22,  7, 23 } },

here instead.  Similar comment for those related below.

-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
-              : CODE_FOR_altivec_vmrglh_direct,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct_be,
       {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
-              : CODE_FOR_altivec_vmrglw_direct_v4si,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si_be,
       {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
-              : CODE_FOR_altivec_vmrghb_direct,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct_be,
       {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
-              : CODE_FOR_altivec_vmrghh_direct,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct_be,
       {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
-    {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
-              : CODE_FOR_altivec_vmrghw_direct_v4si,
+    {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si_be,
       {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
      {OPTION_MASK_P8_VECTOR,
       BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
@@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
rtx op1,
           /* For little-endian, the two input operands must be swapped
               (or swapped back) to ensure proper right-to-left numbering
-             from 0 to 2N-1.  */
-      if (swapped ^ !BYTES_BIG_ENDIAN
-          && icode != CODE_FOR_vsx_xxpermdi_v16qi)
+         from 0 to 2N-1.  Excludes the vmrg[lh][bhw] and xxpermdi ops.  */
+      if (swapped ^ !BYTES_BIG_ENDIAN)
+        if (!(icode == CODE_FOR_altivec_vmrghb_direct_be
+          || icode == CODE_FOR_altivec_vmrglb_direct_be
+          || icode == CODE_FOR_altivec_vmrghh_direct_be
+          || icode == CODE_FOR_altivec_vmrglh_direct_be
+          || icode == CODE_FOR_altivec_vmrghw_direct_v4si_be
+          || icode == CODE_FOR_altivec_vmrglw_direct_v4si_be
+          || icode == CODE_FOR_vsx_xxpermdi_v16qi))
          std::swap (op0, op1);

IIUC, we don't need this part of change once we fix the operand order in
the assembly for those LE "direct"s.

BR,
Kewen


Thanks.  Addressed all the comments as v4.


v4: Update per comments.
v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
patterns.
v2: Split the direct pattern to be and le with same RTL but different insn.

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
                   (subreg:V4SI (reg:V16QI 139) 0)
                   (subreg:V4SI (reg:V16QI 140) 0))
                   [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
Linux.

gcc/ChangeLog:

        PR target/106069
        * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
        (altivec_vmrghb_direct_be): New pattern for BE.
        (altivec_vmrghb_direct_le): New pattern for LE.
        (altivec_vmrghh_direct): Remove.
        (altivec_vmrghh_direct_be): New pattern for BE.
        (altivec_vmrghh_direct_le): New pattern for LE.
        (altivec_vmrghw_direct_<mode>): Remove.
        (altivec_vmrghw_direct_<mode>_be): New pattern for BE.
        (altivec_vmrghw_direct_<mode>_le): New pattern for LE.
        (altivec_vmrglb_direct): Remove.
        (altivec_vmrglb_direct_be): New pattern for BE.
        (altivec_vmrglb_direct_le): New pattern for LE.
        (altivec_vmrglh_direct): Remove.
        (altivec_vmrglh_direct_be): New pattern for BE.
        (altivec_vmrglh_direct_le): New pattern for LE.
        (altivec_vmrglw_direct_<mode>): Remove.
        (altivec_vmrglw_direct_<mode>_be): New pattern for BE.
        (altivec_vmrglw_direct_<mode>_le): New pattern for LE.
        * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
        Adjust.
        * config/rs6000/vsx.md: Likewise.

gcc/testsuite/ChangeLog:

        PR target/106069
        * g++.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo <xionghu...@tencent.com>
---
 gcc/config/rs6000/altivec.md                | 230 ++++++++++++++------
 gcc/config/rs6000/rs6000.cc                 |  24 +-
 gcc/config/rs6000/vsx.md                    |  28 ++-
 gcc/testsuite/g++.target/powerpc/pr106069.C | 120 ++++++++++
 4 files changed, 313 insertions(+), 89 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 2c4940f2e21..962df4657e6 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
    (use (match_operand:V16QI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-                                               : gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
-(define_insn "altivec_vmrghb_direct"
+(define_insn "altivec_vmrghb_direct_be"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
        (vec_select:V16QI
          (vec_concat:V32QI
@@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct"
                     (const_int 5) (const_int 21)
                     (const_int 6) (const_int 22)
                     (const_int 7) (const_int 23)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+       (vec_select:V16QI
+         (vec_concat:V32QI
+           (match_operand:V16QI 2 "register_operand" "v")
+           (match_operand:V16QI 1 "register_operand" "v"))
+         (parallel [(const_int  8) (const_int 24)
+                    (const_int  9) (const_int 25)
+                    (const_int 10) (const_int 26)
+                    (const_int 11) (const_int 27)
+                    (const_int 12) (const_int 28)
+                    (const_int 13) (const_int 29)
+                    (const_int 14) (const_int 30)
+                    (const_int 15) (const_int 31)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "vmrghb %0,%1,%2"
   [(set_attr "type" "vecperm")])
@@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh"
    (use (match_operand:V8HI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
-                                               : gen_altivec_vmrglh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
-(define_insn "altivec_vmrghh_direct"
+(define_insn "altivec_vmrghh_direct_be"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (vec_select:V8HI
+       (vec_select:V8HI
          (vec_concat:V16HI
            (match_operand:V8HI 1 "register_operand" "v")
            (match_operand:V8HI 2 "register_operand" "v"))
@@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct"
                     (const_int 1) (const_int 9)
                     (const_int 2) (const_int 10)
                     (const_int 3) (const_int 11)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghh %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (vec_select:V8HI
+         (vec_concat:V16HI
+           (match_operand:V8HI 2 "register_operand" "v")
+           (match_operand:V8HI 1 "register_operand" "v"))
+         (parallel [(const_int 4) (const_int 12)
+                    (const_int 5) (const_int 13)
+                    (const_int 6) (const_int 14)
+                    (const_int 7) (const_int 15)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "vmrghh %0,%1,%2"
   [(set_attr "type" "vecperm")])
@@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw"
    (use (match_operand:V4SI 2 "register_operand"))]
   "VECTOR_MEM_ALTIVEC_P (V4SImode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
-                        : gen_altivec_vmrglw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
+                                                 operands[1],
+                                                 operands[2]));
+  else
+    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
+                                                 operands[2],
+                                                 operands[1]));
   DONE;
 })
-(define_insn "altivec_vmrghw_direct_<mode>"
+(define_insn "altivec_vmrghw_direct_<mode>_be"
   [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
        (vec_select:VSX_W
          (vec_concat:<VS_double>
@@ -1221,10 +1257,24 @@ (define_insn "altivec_vmrghw_direct_<mode>"
            (match_operand:VSX_W 2 "register_operand" "wa,v"))
          (parallel [(const_int 0) (const_int 4)
                     (const_int 1) (const_int 5)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+  xxmrghw %x0,%x1,%x2
+  vmrghw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+       (vec_select:VSX_W
+         (vec_concat:<VS_double>
+           (match_operand:VSX_W 2 "register_operand" "wa,v")
+           (match_operand:VSX_W 1 "register_operand" "wa,v"))
+         (parallel [(const_int 2) (const_int 6)
+                    (const_int 3) (const_int 7)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "@
-   xxmrghw %x0,%x1,%x2
-   vmrghw %0,%1,%2"
+  xxmrghw %x0,%x1,%x2
+  vmrghw %0,%1,%2"
   [(set_attr "type" "vecperm")])
(define_insn "*altivec_vmrghsf"
@@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb"
    (use (match_operand:V16QI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
-                                               : gen_altivec_vmrghb_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
-(define_insn "altivec_vmrglb_direct"
+(define_insn "altivec_vmrglb_direct_be"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
        (vec_select:V16QI
          (vec_concat:V32QI
@@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct"
                     (const_int 13) (const_int 29)
                     (const_int 14) (const_int 30)
                     (const_int 15) (const_int 31)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrglb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+       (vec_select:V16QI
+         (vec_concat:V32QI
+           (match_operand:V16QI 2 "register_operand" "v")
+           (match_operand:V16QI 1 "register_operand" "v"))
+         (parallel [(const_int 0) (const_int 16)
+                    (const_int 1) (const_int 17)
+                    (const_int 2) (const_int 18)
+                    (const_int 3) (const_int 19)
+                    (const_int 4) (const_int 20)
+                    (const_int 5) (const_int 21)
+                    (const_int 6) (const_int 22)
+                    (const_int 7) (const_int 23)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "vmrglb %0,%1,%2"
   [(set_attr "type" "vecperm")])
@@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh"
    (use (match_operand:V8HI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
-                                               : gen_altivec_vmrghh_direct;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+      gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (
+      gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
-(define_insn "altivec_vmrglh_direct"
+(define_insn "altivec_vmrglh_direct_be"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (vec_select:V8HI
          (vec_concat:V16HI
@@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct"
                     (const_int 5) (const_int 13)
                     (const_int 6) (const_int 14)
                     (const_int 7) (const_int 15)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrglh %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglh_direct_le"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+       (vec_select:V8HI
+         (vec_concat:V16HI
+           (match_operand:V8HI 2 "register_operand" "v")
+           (match_operand:V8HI 1 "register_operand" "v"))
+         (parallel [(const_int 0) (const_int 8)
+                    (const_int 1) (const_int 9)
+                    (const_int 2) (const_int 10)
+                    (const_int 3) (const_int 11)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "vmrglh %0,%1,%2"
   [(set_attr "type" "vecperm")])
@@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw"
    (use (match_operand:V4SI 2 "register_operand"))]
   "VECTOR_MEM_ALTIVEC_P (V4SImode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
-                        : gen_altivec_vmrghw_direct_v4si;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
+                                                 operands[1],
+                                                 operands[2]));
+  else
+    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
+                                                 operands[2],
+                                                 operands[1]));
   DONE;
 })
-(define_insn "altivec_vmrglw_direct_<mode>"
+(define_insn "altivec_vmrglw_direct_<mode>_be"
   [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
        (vec_select:VSX_W
          (vec_concat:<VS_double>
@@ -1327,10 +1413,24 @@ (define_insn "altivec_vmrglw_direct_<mode>"
            (match_operand:VSX_W 2 "register_operand" "wa,v"))
          (parallel [(const_int 2) (const_int 6)
                     (const_int 3) (const_int 7)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "@
+  xxmrglw %x0,%x1,%x2
+  vmrglw %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglw_direct_<mode>_le"
+  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+       (vec_select:VSX_W
+         (vec_concat:<VS_double>
+           (match_operand:VSX_W 2 "register_operand" "wa,v")
+           (match_operand:VSX_W 1 "register_operand" "wa,v"))
+         (parallel [(const_int 0) (const_int 4)
+                    (const_int 1) (const_int 5)])))]
+  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
   "@
-   xxmrglw %x0,%x1,%x2
-   vmrglw %0,%1,%2"
+  xxmrglw %x0,%x1,%x2
+  vmrglw %0,%1,%2"
   [(set_attr "type" "vecperm")])
(define_insn "*altivec_vmrglsf"
@@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
     {
       emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
     {
       emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
     {
       emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
     {
       emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
     {
       emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
     {
       emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
     {
       emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
     }
   DONE;
 })
@@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
     {
       emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
     }
   else
     {
       emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
       emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
-      emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+      emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
     }
   DONE;
 })
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index df491bee2ea..c6ccd40e089 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22942,28 +22942,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
rtx op1,
      CODE_FOR_altivec_vpkuwum_direct,
      {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
-                     : CODE_FOR_altivec_vmrglb_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be
+                     : CODE_FOR_altivec_vmrglb_direct_le,
      {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
-                     : CODE_FOR_altivec_vmrglh_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be
+                     : CODE_FOR_altivec_vmrglh_direct_le,
      {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
-                     : CODE_FOR_altivec_vmrglw_direct_v4si,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be
+                     : CODE_FOR_altivec_vmrglw_direct_v4si_le,
      {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
-                     : CODE_FOR_altivec_vmrghb_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be
+                     : CODE_FOR_altivec_vmrghb_direct_le,
      {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
-                     : CODE_FOR_altivec_vmrghh_direct,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be
+                     : CODE_FOR_altivec_vmrghh_direct_le,
      {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
     {OPTION_MASK_ALTIVEC,
-     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
-                     : CODE_FOR_altivec_vmrghw_direct_v4si,
+     BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be
+                     : CODE_FOR_altivec_vmrghw_direct_v4si_le,
      {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
     {OPTION_MASK_P8_VECTOR,
      BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e226a93bbe5..80f84e9b141 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4688,12 +4688,14 @@ (define_expand "vsx_xxmrghw_<mode>"
                     (const_int 1) (const_int 5)])))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode>
-                        : gen_altivec_vmrglw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
+                                                 operands[1],
+                                                 operands[2]));
+  else
+    emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
+                                                 operands[2],
+                                                 operands[1]));
   DONE;
 }
   [(set_attr "type" "vecperm")])
@@ -4708,12 +4710,14 @@ (define_expand "vsx_xxmrglw_<mode>"
                     (const_int 3) (const_int 7)])))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode>
-                        : gen_altivec_vmrghw_direct_<mode>;
-  if (!BYTES_BIG_ENDIAN)
-    std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
+                                                 operands[1],
+                                                 operands[2]));
+  else
+    emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
+                                                 operands[2],
+                                                 operands[1]));
   DONE;
 }
   [(set_attr "type" "vecperm")])
diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C 
b/gcc/testsuite/g++.target/powerpc/pr106069.C
new file mode 100644
index 00000000000..2cde9b821e3
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr106069.C
@@ -0,0 +1,120 @@
+/* { dg-options "-O -fno-tree-forwprop -maltivec" } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-do run } */
+
+extern "C" void *
+memcpy (void *, const void *, unsigned long);
+typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
+
+union
+{
+  native_simd_type V;
+  int R[4];
+} store_le_vec;
+
+struct S
+{
+  S () = default;
+  S (unsigned B0)
+  {
+    native_simd_type val{B0};
+    m_simd = val;
+  }
+  void store_le (unsigned int out[])
+  {
+    store_le_vec.V = m_simd;
+    unsigned int x0 = store_le_vec.R[0];
+    memcpy (out, &x0, 4);
+  }
+  S rotl (unsigned int r)
+  {
+    native_simd_type rot{r};
+    return __builtin_vec_rl (m_simd, rot);
+  }
+  void operator+= (S other)
+  {
+    m_simd = __builtin_vec_add (m_simd, other.m_simd);
+  }
+  void operator^= (S other)
+  {
+    m_simd = __builtin_vec_xor (m_simd, other.m_simd);
+  }
+  static void transpose (S &B0, S B1, S B2, S B3)
+  {
+    native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
+    native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
+    native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
+    native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
+    B0 = __builtin_vec_mergeh (T0, T1);
+    B3 = __builtin_vec_mergel (T2, T3);
+  }
+  S (native_simd_type x) : m_simd (x) {}
+  native_simd_type m_simd;
+};
+
+void
+foo (unsigned int output[], unsigned state[])
+{
+  S R00 = state[0];
+  S R01 = state[0];
+  S R02 = state[2];
+  S R03 = state[0];
+  S R05 = state[5];
+  S R06 = state[6];
+  S R07 = state[7];
+  S R08 = state[8];
+  S R09 = state[9];
+  S R10 = state[10];
+  S R11 = state[11];
+  S R12 = state[12];
+  S R13 = state[13];
+  S R14 = state[4];
+  S R15 = state[15];
+  for (int r = 0; r != 10; ++r)
+    {
+      R09 += R13;
+      R11 += R15;
+      R05 ^= R09;
+      R06 ^= R10;
+      R07 ^= R11;
+      R07 = R07.rotl (7);
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 ^= R01;
+      R13 ^= R02;
+      R00 += R05;
+      R01 += R06;
+      R02 += R07;
+      R15 ^= R00;
+      R12 = R12.rotl (8);
+      R13 = R13.rotl (8);
+      R10 += R15;
+      R11 += R12;
+      R08 += R13;
+      R09 += R14;
+      R05 ^= R10;
+      R06 ^= R11;
+      R07 ^= R08;
+      R05 = R05.rotl (7);
+      R06 = R06.rotl (7);
+      R07 = R07.rotl (7);
+    }
+  R00 += state[0];
+  S::transpose (R00, R01, R02, R03);
+  R00.store_le (output);
+}
+
+unsigned int res[1];
+unsigned main_state[]{1634760805, 60878,      2036477234, 6,
+                     0,          825562964,  1471091955, 1346092787,
+                     506976774,  4197066702, 518848283,  118491664,
+                     0,          0,          0,          0};
+int
+main ()
+{
+  foo (res, main_state);
+  if (res[0] != 0x41fcef98)
+    __builtin_abort ();
+}
--
2.27.0

Reply via email to