jorgecarleitao commented on pull request #9882:
URL: https://github.com/apache/arrow/pull/9882#issuecomment-813078083


   > @jorgecarleitao I am wondering if you have found a more performant 
implementation or ideas for the `concat` kernel in your `arrow2` branch?
   
   The concat before the `transform` module was 6x slower ^_^. `arrow2` is a 
bit faster than `arrow`. However, the bottleneck in both implementations is 
concatenating validities. While buffers are a simple memcopy, bits require more 
due to the bit offsets. E.g. concatenating an array to an `array1` where 
`array1.len() % 8 != 0` requires shifting the whole bitmap of array2. This 
causes concatenating with validities to be ~2x more expensive than without 
validities, in both implementations. 
https://github.com/jorgecarleitao/arrow2/issues/12 tracks this on the arrow2 
side of things.
   
   The other potential improvement is to downcast instead of using vtables, 
since downcasting allows the compiler to inline some operations.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to