lidavidm commented on pull request #10520:
URL: https://github.com/apache/arrow/pull/10520#issuecomment-861488701


   I added a benchmark but am not too happy with the performance. It looks like 
there's a lot of overhead in the core kernel implementation and not all that 
much time is spent actually copying strings. Accordingly, performance increases 
the more arrays there are to concatenate. This is with n=10 arrays to 
concatenate but with n=100 the bytes per second goes up to ~1000M/s. 
   
   The benchmark is not the same as BinaryJoin exactly since BinaryJoin can 
concatenate a varying number of strings per output slot while 
BinaryJoinElementWise joins a fixed number of strings per output slot. (That, 
plus the extra options offered by this kernel, could probably explain why 
there's more overhead here - BinaryJoin doesn't have to deal with unboxing 
scalars or arrays on every step, tracking nulls, etc. but there's definitely 
ways to improve the implementation here too.)
   
   ```
   
-------------------------------------------------------------------------------------------
   Benchmark                                 Time             CPU   Iterations 
UserCounters...
   
-------------------------------------------------------------------------------------------
   BinaryJoinArrayScalar                104640 ns       104632 ns         6693 
bytes_per_second=1112.55M/s
   BinaryJoinArrayArray                 115231 ns       115119 ns         6084 
bytes_per_second=1011.2M/s
   BinaryJoinElementWiseArrayScalar     218594 ns       218524 ns         3206 
bytes_per_second=534.461M/s
   BinaryJoinElementWiseArrayArray      218135 ns       218114 ns         3209 
bytes_per_second=535.466M/s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to