lidavidm commented on pull request #10520: URL: https://github.com/apache/arrow/pull/10520#issuecomment-861488701
I added a benchmark but am not too happy with the performance. It looks like there's a lot of overhead in the core kernel implementation and not all that much time is spent actually copying strings. Accordingly, performance increases the more arrays there are to concatenate. This is with n=10 arrays to concatenate but with n=100 the bytes per second goes up to ~1000M/s. The benchmark is not the same as BinaryJoin exactly since BinaryJoin can concatenate a varying number of strings per output slot while BinaryJoinElementWise joins a fixed number of strings per output slot. (That, plus the extra options offered by this kernel, could probably explain why there's more overhead here - BinaryJoin doesn't have to deal with unboxing scalars or arrays on every step, tracking nulls, etc. but there's definitely ways to improve the implementation here too.) ``` ------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------- BinaryJoinArrayScalar 104640 ns 104632 ns 6693 bytes_per_second=1112.55M/s BinaryJoinArrayArray 115231 ns 115119 ns 6084 bytes_per_second=1011.2M/s BinaryJoinElementWiseArrayScalar 218594 ns 218524 ns 3206 bytes_per_second=534.461M/s BinaryJoinElementWiseArrayArray 218135 ns 218114 ns 3209 bytes_per_second=535.466M/s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org