https://bugs.kde.org/show_bug.cgi?id=385055
--- Comment #3 from Julian Seward <jsew...@acm.org> --- (In reply to Carl Love from comment #1) > to the code for the xxperm instruction. I verified that the hint was being > The block has 1312 temporaries before instrumentation. With instrumentation > we have 4920 temporaries. > > So, even with the hint, there are too many instructions in the BB. Yes. A good solution would be to (drastically) reduce the length of this translation by building it around Iop_Perm8x8 instead. Have a look at math_PSHUFB_XMM in the amd64 front end for an example of how Iop_Perm8x8 is used 4 times to do what I think is the equivalent shuffle. xxperm has the added complexity that if an index is >= 16 then the value is taken instead from xT. From a quick scan of the sources I can't see whether, in this case result[i] = xT[i] or result[i] = xT[ xB[i] ] Assuming it's the first, one way to shorten that up is like this vec16s = [16, 16 .. 16] // 16 of these mask = cmpGeU16x8( xB, vec16s ) This gives you a mask which shows, for each lane whether the result will come from xT or from xA[ xB[i & 15] ]. Then you can do the permute operation in the style of math_PSHUFB_XMM, and at the end use the normal and-or-not masking using |mask| to copy the relevant bits in from xT instead. -- You are receiving this mail because: You are watching all bug changes.