rtpsw commented on PR #13880: URL: https://github.com/apache/arrow/pull/13880#issuecomment-1219591164
> * I haven't seen much use of macros in this way in Acero code. I think those are probably fine would like to hear opinions from maintainers, perhaps @westonpace or @rok Sure. If there is a cleaner way to code this, I'd be happy to do so. > * I still don't full grasp the need for the KeyHasher class, some more elaboration will probably help to understand the justification. Is KeyHasher strictly better than the current hashing logic in hash join node? (if so, we can unify those two?) Or it is better because of the way asof join works? I'll try to explain a bit more. One observation was that the [`Hashing32` logic in `hash_join`](https://github.com/apache/arrow/blob/d880d7517a33f2ac8ff259cad711bc210fd570c5/cpp/src/arrow/compute/exec/hash_join_node.cc#L1164-L1178) is not readily reusable in `AsofJoinNode`, and besides I wanted to use `Hashing64`. So, I needed to write some hashing code in `AsofJoinNode`, and I placed in in this `KeyHasher` class. Another observation was that the `hash_join` code recomputes certain objects, especially [the column metadata](https://github.com/apache/arrow/blob/d880d7517a33f2ac8ff259cad711bc210fd570c5/cpp/src/arrow/compute/light_array.cc#L162-L163) is computed [within `Hashing32::HashBatch`](https://github.com/apache/arrow/blob/d880d7517a33f2ac8ff259cad711bc210fd570c5/cpp/src/arrow/compute/exec/key_hash.cc#L896), which do not change from one batch to another having the same schema. The `KeyHasher` code just caches what (and hopefully all that) can be reused for hashing across these batches. Presumably, the `hash_join` code could leverage similar caching. Surely, there is some good way to unify this and the proposed `KeyHasher` code. However, I think it would be way out of scope for this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org