https://bugs.kde.org/show_bug.cgi?id=385055

--- Comment #3 from Julian Seward <jsew...@acm.org> ---
(In reply to Carl Love from comment #1)

> to the code for the xxperm instruction.  I verified that the hint was being

> The block has 1312 temporaries before instrumentation.  With instrumentation
> we have 4920 temporaries.  
> 
> So, even with the hint, there are too many instructions in the BB.

Yes.

A good solution would be to (drastically) reduce the length of this
translation by building it around Iop_Perm8x8 instead.  Have a look
at math_PSHUFB_XMM in the amd64 front end for an example of how 
Iop_Perm8x8 is used 4 times to do what I think is the equivalent
shuffle.

xxperm has the added complexity that if an index is >= 16 then the
value is taken instead from xT.  From a quick scan of the sources I
can't see whether, in this case

  result[i] = xT[i]

or

  result[i] = xT[ xB[i] ]

Assuming it's the first, one way to shorten that up is like this

  vec16s = [16, 16 .. 16]  // 16 of these

  mask   = cmpGeU16x8( xB, vec16s )

This gives you a mask which shows, for each lane whether the result
will come from xT or from xA[ xB[i & 15] ].  Then you can do the
permute operation in the style of math_PSHUFB_XMM, and at the end use
the normal and-or-not masking using |mask| to copy the relevant bits
in from xT instead.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to