[valgrind] [Bug 429354] PPC ISA 3.1 support is missing, part 8

2021-03-04 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=429354

Carl Love  changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 429354] PPC ISA 3.1 support is missing, part 8

2021-03-04 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=429354

Carl Love  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|CONFIRMED   |RESOLVED

--- Comment #8 from Carl Love  ---
closing

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 429354] PPC ISA 3.1 support is missing, part 8

2021-03-04 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=429354

Carl Love  changed:

   What|Removed |Added

 Status|REPORTED|CONFIRMED
 Ever confirmed|0   |1

--- Comment #7 from Carl Love  ---

Patches committed.

commit 3c31707cd3d795bcdd3afc540b5aad75d1e50008 (HEAD -> master, origin/master,
origin/HEAD)
Author: Carl Love 
Date:   Mon Nov 16 19:53:22 2020 -0600

VSX Mask Manipulation operation tests.

commit 82777ee4080a339fb16d970411b743598b00590c
Author: Carl Love 
Date:   Fri May 1 23:49:33 2020 -0500

ISA 3.1 VSX Mask Manipulation Operations

Add support for:

mtvsrbmMove to VSR Byte Mask
mtvsrbmiMove To VSR Byte Mask Immediate
mtvsrdmMove to VSR Doubleword Mask
mtvsrhmMove to VSR Halfword Mask
mtvsrqmMove to VSR Quadword Mask
mtvsrwmMove to VSR Word Mask
vcntmbbVector Count Mask Bits Byte
vcntmbdVector Count Mask Bits Doubleword
vcntmbhVector Count Mask Bits Halfword
vcntmbwVector Count Mask Bits Word
vexpandbmVector Expand Byte Mask
vexpanddmVector Expand Doubleword Mask
vexpandhmVector Expand Halfword Mask
vexpandqmVector Expand Quadword Mask
vexpandwmVector Expand Word Mask
vextractbmVector Extract Byte Mask
vextractdmVector Extract Doubleword Mask
vextracthmVector Extract Halfword Mask
vextractqmVector Extract Quadword Mask
vextractwmVector Extract Word Mask

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 429354] PPC ISA 3.1 support is missing, part 8

2021-03-04 Thread Julian Seward
https://bugs.kde.org/show_bug.cgi?id=429354

--- Comment #6 from Julian Seward  ---
Thanks for the fixes.  r+; please land.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 429354] PPC ISA 3.1 support is missing, part 8

2021-02-25 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=429354

Carl Love  changed:

   What|Removed |Added

 Attachment #133461|0   |1
is obsolete||

--- Comment #5 from Carl Love  ---
Created attachment 136162
  --> https://bugs.kde.org/attachment.cgi?id=136162=edit
Functional support for ISA 3.1, VSX Mask manipulation operations

Updated the patch per comments.

I looked at the suggestions on how to make copy_MSB_bit_fields() more
efficient.  PPC64 supports doing the basic algorithm that was outlined as a
single V128 value except for doing the sum across lanes.  That would need to be
done using a clean helper.  In studying the code, I realized that the
copy_MSB_bit_fields() is very similar to the ISA 3.1 instruction vgnb.  The
difference is the result is stored in the low order bits not the high order
bits.  I rewrote copy_MSB_bit_fields() using the vgnb implementation as a base
leveraging the existing clean helpers.  The new function is much more
efficient.  Although I didn't use the suggested algorithm, it was very helpful
to study.  I realized that a number of the other instructions in the patch
could be done using the arithmetic right shift much more efficiently without
the use of the copy_MSB_bit_fields() or the "for(i = 0; i< max; i++)" which
Julian had concerns about.

Reimplemented the vexpandbm, vexpanddmn, vexpandhm, vexpandwm instructions 
eliminating the call to copy_MSB_bit_fileds() and the need for the
for(i = 0; i< max; i++) loop.  The new implementation is much more efficient.

Reimplemented the mtvsrbm, mtvsrhm, mtvsrwm, mtvsrdm instructions to remove the
need for the for(i = 0; i< max; i++) loop.  The new implementation is much more
efficient.

The for(i = 0; i< max; i++) has been removed.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 429354] PPC ISA 3.1 support is missing, part 8

2021-02-15 Thread Julian Seward
https://bugs.kde.org/show_bug.cgi?id=429354

--- Comment #4 from Julian Seward  ---
return a & 0xFF, I meant.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 429354] PPC ISA 3.1 support is missing, part 8

2021-02-15 Thread Julian Seward
https://bugs.kde.org/show_bug.cgi?id=429354

--- Comment #3 from Julian Seward  ---
(In reply to Julian Seward from comment #2)

>uint16_t getMSBs_8x16(vec128)
>{
>   let hiHalf = vec128[127:64]  // LE numbering
>   let loHalf = vec128[ 63:0]
>   // In each byte lane, copy the MSB to all bit positions
>   hiHalf = shift_right_signed_8x8(hiHalf, 7);
>   loHalf = shift_right_signed_8x8(loHalf, 7);
>   // Now each byte lane is either 0x00 or 0xFF
>   // Make (eg) lane 7 contain either 0x00 or 0x80, lane 6 contain
>   // either 0x00 or 0x40, etc
>   hiHalf &= 0x8040201008040201;
>   loHalf &= 0x8040201008040201;
>   hi8msbs = add_across_lanes_8x8(hiHalf)
>   lo8msbs = add_across_lanes_8x8(loHalf)
>   return (hi8msbs << 8) | lo8msbs;
>}

One more thought, regarding add_across_lanes_8x8().  In fact you can
do this semi-reasonably using standard 64-bit scalar code, because of the
nature of the values involved.  Specifically, we are adding together 8 bytes,
each of which is either zero or it has a 1 bit in a different location.
Hence there will never be any carry-bit propagation at all in the addition,
and so it can be implemented -- for this particular use case only -- as

   uint64_t add_across_lanes_8x8(uint64_t a) {
  a += (a >> 8);
  a += (a >> 16);
  a += (a >> 32);
  return a;
   }

(I *think*)

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 429354] PPC ISA 3.1 support is missing, part 8

2021-02-13 Thread Julian Seward
https://bugs.kde.org/show_bug.cgi?id=429354

--- Comment #2 from Julian Seward  ---
==

https://bugs.kde.org/attachment.cgi?id=133462
Test cases for VSX Mask Manipulation operations

OK to land

==

https://bugs.kde.org/attachment.cgi?id=133461
Functional support for ISA 3.1, VSX Mask manipulation operations

I have some concerns about the verboseness/inefficency of the generated IR.

+static IRExpr * copy_MSB_bit_fields (  IRExpr *src, UInt size )
+{
+   /* Input src is a V128 value.  Input size is the number of bits in each
+* vector field.  The function copies the MSB of each vector field into
+* the low order bits of the 64-bit result.
+*/

You can do it like this, but it looks very expensive.  Note that there is
an IROp which (if I understand this correctly) does what you want:

  /* MISC CONVERSION -- get high bits of each byte lane, a la
 x86/amd64 pmovmskb */
  Iop_GetMSBs8x16, /* V128 -> I16 */

You could use that instead (and I'd encourage you to do so), although
it does mean you'd have to handle this in the POWER backend, obviously.

Although now I think about it, I am not sure what the deal is with
endianness -- Iop_GetMSBs8x16 has been used so far only on little-endian
targets.

If Power has an instruction that can add all the lanes of a SIMD value
together, creating a single number, then there's a faster way to do this, that
doesn't involve Iop_GetMSBs8x16.  Something like this:

   uint16_t getMSBs_8x16(vec128)
   {
  let hiHalf = vec128[127:64]  // LE numbering
  let loHalf = vec128[ 63:0]
  // In each byte lane, copy the MSB to all bit positions
  hiHalf = shift_right_signed_8x8(hiHalf, 7);
  loHalf = shift_right_signed_8x8(loHalf, 7);
  // Now each byte lane is either 0x00 or 0xFF
  // Make (eg) lane 7 contain either 0x00 or 0x80, lane 6 contain
  // either 0x00 or 0x40, etc
  hiHalf &= 0x8040201008040201;
  loHalf &= 0x8040201008040201;
  hi8msbs = add_across_lanes_8x8(hiHalf)
  lo8msbs = add_across_lanes_8x8(loHalf)
  return (hi8msbs << 8) | lo8msbs;
   }

There are variants, but you get the general idea from the above.

+   if (IFIELD(theInstr, 1, 5) == 0xA)//mtvsrbmi
+  inst_select = 0x;

Add a comment to explain that this is a special-case hack for mtvsrbmi.

+   default:
+  vex_printf("dis_VSR_byte_mask (opc2)\n");
+  return False;
+   }

Don't print on failure paths; only return False.


+   for(i = 0; i< max; i++) {
+  ones_lo[i+1] = newTemp( Ity_I64 );
+  test_lo[i] = newTemp( Ity_I64 );
+  ones_hi[i+1] = newTemp( Ity_I64 );
+  test_hi[i] = newTemp( Ity_I64 );
..

This seems really inefficient; is there no way to do this somewhat
in parallel using SIMD IROps ?

==

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 429354] PPC ISA 3.1 support is missing, part 8

2020-11-19 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=429354

Carl Love  changed:

   What|Removed |Added

 CC||will_schm...@vnet.ibm.com

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 429354] PPC ISA 3.1 support is missing, part 8

2020-11-19 Thread Carl Love
https://bugs.kde.org/show_bug.cgi?id=429354

--- Comment #1 from Carl Love  ---
Created attachment 133462
  --> https://bugs.kde.org/attachment.cgi?id=133462=edit
Test cases for VSX Mask Manipulation operations

Add test case patch

-- 
You are receiving this mail because:
You are watching all bug changes.