zanmato1984 commented on code in PR #45612:
URL: https://github.com/apache/arrow/pull/45612#discussion_r1968634470
##########
cpp/src/arrow/acero/swiss_join_internal.h:
##########
@@ -604,6 +609,16 @@ class SwissTableForJoinBuild {
MemoryPool* pool_;
int64_t hardware_flags_;
+ // One per batch.
+ //
+ // Informations like hashes and partitions of each batch.
+ //
+ struct BatchState {
+ std::vector<uint32_t> hashes;
+ std::vector<uint16_t> prtn_ranges;
+ std::vector<uint16_t> prtn_row_ids;
Review Comment:
Yes, this is what I meant in the Overhead section of the PR description
(quoted below).
> ... and worsen the memory profile by 6 bytes per row (4 bytes for hash and
2 bytes for row id in partition).
Some more details you may also want to know:
* The `prtn_ranges` is one element per partition.
* This `BatchState` struct is per batch.
Both are less space complexity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]