Weston Pace created ARROW-17115:
-----------------------------------
Summary: [C++] HashJoin fails if it encounters a batch with more
than 32Ki rows
Key: ARROW-17115
URL: https://issues.apache.org/jira/browse/ARROW-17115
Project: Apache Arrow
Issue Type: Bug
Components: C++
Reporter: Weston Pace
Assignee: Weston Pace
The new swiss join assumes that batches are being broken according to the
morsel/batch model and it assumes those batches have, at most, 32Ki rows
(signed 16-bit indices are used in various places).
However, we are not currently slicing all of our inputs to batches this small.
This is causing conbench to fail and would likely be a problem with any large
inputs.
We should fix this by slicing batches in the engine to the appropriate maximum
size.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)