Thanks,

On 2023/3/31 03:30, Segher Boessenkool wrote:
Hi!

On Fri, Feb 10, 2023 at 10:59:52AM +0800, Xionghu Luo via Gcc-patches wrote:
The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.

This isn't so obvious at all.  All elements of these constructs are
very much not endian-independent, because of very unfortunate choices
in the meaning of some RTL constructs.  It is possible all things in
this negate all other things, but please show that then.

  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
                   (subreg:V4SI (reg:V16QI 139) 0)
                   (subreg:V4SI (reg:V16QI 140) 0))
                   [const_int 0 4 1 5]))

With BE, if the source vecs are ABCD and EFGH, the vec_concat gives
ABCDEFGH, and the vec_select than gives AEBF.

What happens for LE?

on LE, the sources looks like DCBA and HGFE, vec_concat gives HGFEACBA with index reversed [7 6 5 4 3 2 1 0], so it also chooses FBEA like BE.


Take the case as example on P8LE:

test.c

__attribute__ ((__noinline__))
vector int bar (vector int a, vector int b)
{
  return vec_vmrghw (a, b);
}

int main ()
{

  vector int a = {0xa1345678, 0xa2345678,0xa3345678, 0xa4345678};
  vector int b = {0xb1345678, 0xb2345678,0xb3345678, 0xb4345678};
  vector int c = bar (a, b);
  printf("%x,%x,%x,%x\n", c[0], c[1], c[2], c[3]);
  return c[0];
}


.expand:

_3 = VEC_PERM_EXPR <a_1(D), b_2(D), { 0, 4, 1, 5 }>;

(insn 7 4 8 2 (set (reg:V16QI 122)
        (subreg:V16QI (reg/v:V4SI 118 [ a ]) 0)) "test.c":15:10 -1
     (nil))
(insn 8 7 9 2 (set (reg:V16QI 123)
        (subreg:V16QI (reg/v:V4SI 119 [ b ]) 0)) "test.c":15:10 -1
     (nil))
(insn 9 8 10 2 (set (reg:V4SI 124)
        (vec_select:V4SI (vec_concat:V8SI (subreg:V4SI (reg:V16QI 122) 0)
                (subreg:V4SI (reg:V16QI 123) 0))
            (parallel [
                    (const_int 0 [0])
                    (const_int 4 [0x4])
                    (const_int 1 [0x1])
                    (const_int 5 [0x5])
                ]))) "test.c":15:10 -1
     (nil))


And .vregs to .final:

(insn 15 9 16 (set (reg/i:V4SI 66 %v2)
        (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 66 %v2 [125])
                (reg:V4SI 67 %v3 [126]))
            (parallel [
                    (const_int 0 [0])
                    (const_int 4 [0x4])
                    (const_int 1 [0x1])
                    (const_int 5 [0x5])
                ]))) "test.c":16:1 1825 {altivec_vmrglw_direct_v4si_le}
     (expr_list:REG_DEAD (reg:V4SI 67 %v3 [126])
        (nil)))


As altivec_vmrglw_direct_v4si_le is defined as with this patch:


(define_insn "altivec_vmrglw_direct_<mode>_le"
  [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
        (vec_select:VSX_W
          (vec_concat:<VS_double>
            (match_operand:VSX_W 2 "register_operand" "wa,v")
            (match_operand:VSX_W 1 "register_operand" "wa,v"))
          (parallel [(const_int 0) (const_int 4)
                     (const_int 1) (const_int 5)])))]
  "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
  "@
   xxmrglw %x0,%x1,%x2
   vmrglw %0,%1,%2"
  [(set_attr "type" "vecperm")])


ASM:

bar:
.LFB11:
        .cfi_startproc
        xxmrglw 34,35,34
        blr


./test
a1345678,b1345678,a2345678,b2345678

Exactly matches [a1 b1 a2 b2].  Does this look reasonable?


BR,
Xionghu

Reply via email to