rtpsw commented on PR #13880:
URL: https://github.com/apache/arrow/pull/13880#issuecomment-1219591164

   >     * I haven't seen much use of macros in this way in Acero code. I think 
those are probably fine would like to hear opinions from maintainers, perhaps 
@westonpace or @rok
   
   Sure. If there is a cleaner way to code this, I'd be happy to do so.
   
   >     * I still don't full grasp the need for the KeyHasher class, some more 
elaboration will probably help to understand the justification. Is KeyHasher 
strictly better than the current hashing logic in hash join node? (if so, we 
can unify those two?) Or it is better because of the way asof join works?
   
   I'll try to explain a bit more. One observation was that the [`Hashing32` 
logic in 
`hash_join`](https://github.com/apache/arrow/blob/d880d7517a33f2ac8ff259cad711bc210fd570c5/cpp/src/arrow/compute/exec/hash_join_node.cc#L1164-L1178)
 is not readily reusable in `AsofJoinNode`, and besides I wanted to use 
`Hashing64`. So, I needed to write some hashing code in `AsofJoinNode`, and I 
placed in in this `KeyHasher` class. Another observation was that the 
`hash_join` code recomputes certain objects, especially [the column 
metadata](https://github.com/apache/arrow/blob/d880d7517a33f2ac8ff259cad711bc210fd570c5/cpp/src/arrow/compute/light_array.cc#L162-L163)
 is computed [within 
`Hashing32::HashBatch`](https://github.com/apache/arrow/blob/d880d7517a33f2ac8ff259cad711bc210fd570c5/cpp/src/arrow/compute/exec/key_hash.cc#L896),
 which do not change from one batch to another having the same schema. The 
`KeyHasher` code just caches what (and hopefully all that) can be reused for 
hashing across these 
 batches.
   
   Presumably, the `hash_join` code could leverage similar caching. Surely, 
there is some good way to unify this and the proposed `KeyHasher` code. 
However, I think it would be way out of scope for this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to