[valgrind] [Bug 385409] s390x: z13 vector integer instructions not implemented

Julian Seward Tue, 25 Sep 2018 03:45:59 -0700

https://bugs.kde.org/show_bug.cgi?id=385409


--- Comment #71 from Julian Seward <jsew...@acm.org> ---
(In reply to Andreas Arnez from comment #68)
> Created attachment 115209 [details]
> s390x: Vector integer and string instruction support

Thank you for finishing this up.  OK to land provided the stuff below is fixed
(none of it is a big deal) and the test in the next para passes.

It is also important to check that all new or modified test cases run
on Memcheck without asserting or otherwise failing.  The regression test
driver will only check them with the "none" tool since the tests are inside
none/tests/s390.  You can do this manually by running the relevant tests
directly on memcheck (just run "by hand").


--- a/VEX/priv/guest_s390_defs.h
+++ b/VEX/priv/guest_s390_defs.h

+/* Arguments of s390x_dirtyhelper_vec_op(...) which are packed into one
+   ULong variable.
+ */
+typedef union {
..
+} s390x_vec_op_details_t;

add STATIC_ASSERT(sizeof(s390x_vec_op_details_t) == sizeof(ULong))


--- a/VEX/priv/guest_s390_helpers.c
+++ b/VEX/priv/guest_s390_helpers.c

+static void
+s390_cc_set_expr(IRExpr* cc)
+{
+   vassert(typeOfIRExpr(irsb->tyenv, cc) == Ity_I64);
+
+   s390_cc_thunk_fill(mkU64(S390_CC_OP_SET), cc, mkU64(0), mkU64(0));
+}

Similar to conversation last week this is again a case where, if |cc| is
used more than once (maybe in the callee of this function) then code for it
will get generated to evaluate it multiple times, and it's quite likely that
those duplicates will not get removed during IR optimisation.  It would be
safer to pass around an IRTemp and use |mkexpr(cc)| instead.

If this is a big amount of work, don't worry; but if it's small and easy
to fix then it would be good to do so now.


+static IRExpr*
+s390_V128_compareLT128x1(IRExpr* arg1, IRExpr* arg2, Bool allow_equal)

Just a sanity check: is the logic with |allow_equal| here correct?
I am not saying it isn't, but I think a second look would not be a bad idea.


+/* Permorms "arg1 + arg2 + carry_out_bit(arg1 + arg2)".

"Performs"


+   if (UNLIKELY(vex_traceflags & VEX_TRACE_FE))
+      s390_disasm(ENC5(MNM, VR, UDXB, VR, UINT), mnm, v1, d2, 0, b2, v3, m4);

and many similar s390_disasm calls: have you checked that s390_disasm can
print all of these new insns?  One way to try is to run your test suites
manually with --trace-flags=10000000 --trace-notbelow=0.  You'll get gigabytes
of output, but you can at least see if easily if s390_disasm aborts, or prints
some kind of "I can't handle this" message.


+   const IROp ops[] = { Iop_InterleaveHI8x16, Iop_InterleaveHI16x8,
+                        Iop_InterleaveHI32x4, Iop_InterleaveHI64x2 };
+   vassert(m4 < sizeof(ops));
+   put_vr_qw(v1, binop(ops[m4], get_vr_qw(v2), get_vr_qw(v3)));

That assertion is surely not right, or at least it assumes that sizeof(IROp)
== 1, which I think isn't true.  Shouldn't it be "m4 <
sizeof(ops)/sizeof(ops[0]))" -- viz, the traditional idiom for "the number of
elements in this array" ?

Please fix, also any other similar that you can find in this diff.  I saw it
also in the following, and maybe there are more?

s390_irgen_VMRH(UChar v1, UChar v2, UChar v3, UChar m4)
s390_irgen_VMRL(UChar v1, UChar v2, UChar v3, UChar m4)
s390_irgen_VPK(UChar v1, UChar v2, UChar v3, UChar m4)
s390_irgen_VUPH(UChar v1, UChar v2, UChar m3)
s390_irgen_VUPLH(UChar v1, UChar v2, UChar m3)
s390_irgen_VUPL(UChar v1, UChar v2, UChar m3)
s390_irgen_VUPLL(UChar v1, UChar v2, UChar m3)
s390_irgen_VPKS(UChar v1, UChar v2, UChar v3, UChar m4, UChar m5)
Also
s390_irgen_VMO(UChar v1, UChar v2, UChar v3, UChar m4) and its friends.


static const HChar *
 s390_irgen_VSEL(UChar v1, UChar v2, UChar v3, UChar v4)
 {
-   IRExpr* vA = get_vr_qw(v3);
-   IRExpr* vB = get_vr_qw(v2);
-   IRExpr* vC = get_vr_qw(v4);
-
-   /* result = (vA & ~vC) | (vB & vC) */
-   put_vr_qw(v1,
-             binop(Iop_OrV128,
-                   binop(Iop_AndV128, vA, unop(Iop_NotV128, vC)),
-                   binop(Iop_AndV128, vB, vC)
-                  )
-            );
+   IRExpr* vIfTrue = get_vr_qw(v2);
+   IRExpr* vIfFalse = get_vr_qw(v3);
+   IRExpr* vCond = get_vr_qw(v4);
+
+   put_vr_qw(v1, s390_V128_bitwiseITE(vCond, vIfTrue, vIfFalse));
    return "vsel";
 }

This kind of thing potentially also suffers from the IRExpr* duplication
problem, assuming that s390_V128_bitwiseITE uses vCond twice.  You could fix
it directly in s390_V128_bitwiseITE by allocating a new temp, binding
(assign()-ing) vCond to that and then using the temp twice (if it doesn't
already do that).


--- a/VEX/priv/host_s390_defs.h
+++ b/VEX/priv/host_s390_defs.h

+   S390_VEC_COUNT_LEADING_ZEROEZ,
+   S390_VEC_COUNT_TRAILING_ZEROEZ,

Really, _ZEROEZ and not _ZEROES ?  Can we use the traditional spelling, unless
ZEROEZ is really what was intended?


--- a/VEX/priv/host_s390_isel.c
+++ b/VEX/priv/host_s390_isel.c

+#define IRCONST_IS_EQUAL_U8(arg, val) \
+   ( ((arg)->tag == Iex_Const) && ((arg)->Iex.Const.con->Ico.U8 == (val)) )

This misses out (or assumes) the check that |arg| has type Ity_I8.  This might
work, but it's somewhat dangerous in the case of programming mistakes.  Please
change it to be

  ((arg)->tag == Iex_Const)
  && ((arg)->Iex.Const.tag == Ico_U8)  <--- the missing check
  && ((arg)->Iex.Const.con->Ico.U8 == (val))


-      s390_unop_t vec_op = 0;
+      UInt vec_op = 0;

Is it necessary to "weaken" the type of vec_op here?


+      case Iop_CmpNEZ128x1: {
+         IRExpr* lowPartNEZ = IRExpr_Unop(Iop_CmpNEZ64,
+                                          IRExpr_Unop(Iop_V128to64, arg));
+         IRExpr* highPartNEZ = IRExpr_Unop(Iop_CmpNEZ64,
+                                           IRExpr_Unop(Iop_V128HIto64, arg));
+         reg1 = s390_isel_int_expr(env,
+                                   IRExpr_Binop(Iop_Or64,
+                                                IRExpr_Unop(Iop_1Sto64,
lowPartNEZ),
+                                                IRExpr_Unop(Iop_1Sto64,
highPartNEZ)
+                                               )
+                                  );
+
+         dst = newVRegV(env);
+         addInstr(env, s390_insn_vec_binop(size, S390_VEC_INIT_FROM_GPRS,
+                                           dst, reg1, reg1));
+
+         return dst;
+      }

This is (I think!) correct but it's also very inefficient.  We're given a
128 bit value |arg| and we want to make |dst| be 0..(126)..0 if |arg| is
|all| zeroes and 1..(126)..1 if arg is all ones.

Better would be like this

   IRExpr* low64     = IRExpr_Unop(Iop_V128to64, arg);
   IRExpr* high64    = IRExpr_Unop(Iop_V128HIto64, arg);
   IRExpr* both      = IRExpr_Binop(Iop_Or64, low64, high64);
   IRExpr* anyNonZ   = IRExpr_Unop(Iop_CmpNEZ64, both);
   IRExpr* anyNonZ64 = IRExpr_Unop(Iop_1Sto64, anyNonZ);
   reg1 = s390_isel_int_expr(env, anyNonZ64);

Even then, it's pretty much a hack to create s390 code by first generating
*even more* IR and then handing that off to some other part of the instruction
selector.  It also means that the tree for |arg| will get instruction-selected
twice.

Much better would just be to generate the relevant s390 insns directly.  But
fixing that is beyond the scope of this round of reviewing/bug fixing.


+      case Iop_PwAddL8Ux16: {
+         /* There is no such instruction. We have to emulate it. */
+         dst = s390_isel_vec_expr(env,
+                                  IRExpr_Binop(Iop_Add16x8,
+                                              
IRExpr_Binop(Iop_InterleaveEvenLanes8x16,
+                                                           
IRExpr_Const(IRConst_V128(0x00)),
+                                                            arg),
+                                              
IRExpr_Binop(Iop_InterleaveOddLanes8x16,
+                                                           
IRExpr_Const(IRConst_V128(0x00)),
+                                                            arg)
+                                              )
+                                 );
+
+         return dst;
+      }

ditto (re "pretty much a hack".)  At least, please format to fit inside 80
cols.

Please write the arguments to IRConst_V128 here as 0x0000.  This is 
consistent with the style of other use of it and emphasises the fact
that the argument is a 16-bit value (one bit for each byte of a 
16 byte vector).


+      case Iop_PwAddL16Ux8:
+         if ( (arg->tag == Iex_Unop) && (arg->Iex.Unop.op == Iop_PwAddL8Ux16)
)
+         {
+            size = 1;
+            arg = arg->Iex.Unop.arg;
+         }
+         else
+         {
+            size = 2;
+         }
+         vec_op = S390_VEC_PWSUM_W;
+         goto Iop_Pairwise_wrk;

Please change to use the house style, not so verbose:

+      case Iop_PwAddL16Ux8:
+         if (arg->tag == Iex_Unop && arg->Iex.Unop.op == Iop_PwAddL8Ux16) {
+            size = 1;
+            arg = arg->Iex.Unop.arg;
+         } else {
+            size = 2;
+         }
+         vec_op = S390_VEC_PWSUM_W;
+         goto Iop_Pairwise_wrk;

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 385409] s390x: z13 vector integer instructions not implemented

Reply via email to