[GitHub] [arrow] drin commented on issue #13981: Performance of arrow::compute compared to raw operations on `arrow::Array`

GitBox Fri, 26 Aug 2022 07:23:03 -0700


drin commented on issue #13981:
URL: https://github.com/apache/arrow/issues/13981#issuecomment-1228554377


   That amount of overhead is definitely unexpected. For reference, I am 
working on a compute function for hashing and I see 15-25% overhead ([benchmark 
results](https://docs.google.com/presentation/d/1cUU_F3jB6LsOLbClhl34YdQiodtbz7l76l3juHTsC5k/edit#slide=id.g13e9d117f47_0_63)).
 That being said, some improvements are in the pipeline, such as 
[ARROW-16756](https://issues.apache.org/jira/browse/ARROW-16756) and others, to 
address some overheads. I also think you'll measure more overhead if your 
arrays are small and overhead can be amplified with many invocations (many 
chunks).
   
   Generally speaking, what version of arrow are you on and what is the layout 
of the data? It sounds like you just have Arrays (contiguous) so maybe some of 
the overheads I have seen are not affecting you much.
   
   Loosely related, you could consider using a 
[MapArray](https://arrow.apache.org/docs/cpp/api/array.html#_CPPv4N5arrow8MapArrayE)
 instead of 2 separate arrays. I think this should reduce steps by a bit 
because the validity map is shared.
   
   Separately, this seems like a case where an array version of 
[map_lookup](https://arrow.apache.org/docs/cpp/compute.html#cpp-compute-vector-structural-transforms)
 would be nice.
   
   Also, for convenience, would you mind putting your code in a repo or a gist, 
instead of a tarball? I haven't looked at it yet because of the extra steps 
involved


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] drin commented on issue #13981: Performance of arrow::compute compared to raw operations on `arrow::Array`

Reply via email to