zanmato1984 commented on code in PR #41335:
URL: https://github.com/apache/arrow/pull/41335#discussion_r1600116923
##########
cpp/src/arrow/acero/hash_join_node_test.cc:
##########
@@ -3201,5 +3203,55 @@ TEST(HashJoin, ChainedIntegerHashJoins) {
}
}
+// Test that a large number of joins don't overflow the temp vector stack,
like GH-39582
+// and GH-39951.
+TEST(HashJoin, ManyJoins) {
+ // The idea of this case is to create many nested join nodes that may
possibly cause
+ // recursive usage of temp vector stack. To make sure that the recursion
happens:
+ // 1. A left-deep join tree is created so that the left-most (the final
probe side)
+ // table will go through all the hash tables from the right side.
+ // 2. Left-outer join is used so that every join will increase the
cardinality.
+ // 3. The left-most table contains rows of unique integers from 0 to N.
+ // 4. Each right table at level i contains two rows of integer i, so that
the probing of
+ // each level will increase the result by one row.
+ // 5. The left-most table is a single batch of enough rows, so that at each
level, the
+ // probing will accumulate enough result rows to have to output to the
subsequent level
+ // before finishing the current batch (releasing the buffer allocated on the
temp vector
+ // stack), which is essentially the recursive usage of the temp vector stack.
+
+ // A fair number of joins to guarantee temp vector stack overflow before
GH-41335.
+ const int num_joins = 64;
+
+ // `ExecBatchBuilder::num_rows_max()` is the number of rows for swiss join
to accumulate
+ // before outputting.
+ const int num_left_rows = ExecBatchBuilder::num_rows_max();
+ ASSERT_OK_AND_ASSIGN(
+ auto left_batches,
+ MakeIntegerBatches({[](int row_id) -> int64_t { return row_id; }},
+ schema({field("l_key", int8())}),
Review Comment:
Ah good catch. I'll change to int32. Thank you!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]