jorgecarleitao commented on pull request #9882: URL: https://github.com/apache/arrow/pull/9882#issuecomment-813078083
> @jorgecarleitao I am wondering if you have found a more performant implementation or ideas for the `concat` kernel in your `arrow2` branch? The concat before the `transform` module was 6x slower ^_^. `arrow2` is a bit faster than `arrow`. However, the bottleneck in both implementations is concatenating validities. While buffers are a simple memcopy, bits require more due to the bit offsets. E.g. concatenating an array to an `array1` where `array1.len() % 8 != 0` requires shifting the whole bitmap of array2. This causes concatenating with validities to be ~2x more expensive than without validities, in both implementations. https://github.com/jorgecarleitao/arrow2/issues/12 tracks this on the arrow2 side of things. The other potential improvement is to downcast instead of using vtables, since downcasting allows the compiler to inline some operations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org