AntoinePrv commented on PR #47573:
URL: https://github.com/apache/arrow/pull/47573#issuecomment-3406552924

   One hypothesis I'm wondering about is whether mixing scalar code with SIMD 
introduce additional latency.
   
   For comparison, the Lemire implementation I'm curious about is loading data 
once with a `load_unaligned`, then using a swizzle (byte reorder), then a 
rhsift, then a mask.
   For small sizes, we could even make multiple shifts per swizzle and multiple 
swizzle per read (the extreme case being bit_width=1 where we can read once 
~256 bits and write unpacked 256 values).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to