On Mon, 8 Jan 2024 06:06:22 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Thanks for the updates! >> >> One more idea: Your AVX2 solution has a lot of cost for converting the mask >> to a permutation. Might it make sense to split this off into a separate >> vector-node, so that it can float out of a loop if the mask is invariant? > >> Thanks for the updates! >> >> One more idea: Your AVX2 solution has a lot of cost for converting the mask >> to a permutation. Might it make sense to split this off into a separate >> vector-node, so that it can float out of a loop if the mask is invariant? > > CompressV / ExpandV only accepts two inputs, vector to be operated on and > mask under which operation is performed, permute table based implementation > is specific to x86 backend implementation. @jatin-bhateja I think you can expand them in the matcher into several `MachNode`s that will get scheduled separately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17261#issuecomment-1880724248