zanmato1984 commented on code in PR #43389:
URL: https://github.com/apache/arrow/pull/43389#discussion_r1709179399
##########
cpp/src/arrow/acero/swiss_join.cc:
##########
@@ -593,7 +586,8 @@ void RowArrayMerge::CopyVaryingLength(RowTableImpl* target,
const RowTableImpl&
int64_t source_row_id = source_rows_permutation[i];
const uint64_t* source_row_ptr = reinterpret_cast<const uint64_t*>(
source.data(2) + source_offsets[source_row_id]);
- uint32_t length = source_offsets[source_row_id + 1] -
source_offsets[source_row_id];
+ int64_t length = source_offsets[source_row_id + 1] -
source_offsets[source_row_id];
+ DCHECK_LE(length, std::numeric_limits<uint32_t>::max());
Review Comment:
It is indeed an inherent limitation of the current row table implementation:
the maximum length of a single row is 4GB because, for example, we use
`uint32_t` as the field offset within a row:
https://github.com/apache/arrow/blob/3420c0db2fe49d81bf3caf673e4e1302153a2c49/cpp/src/arrow/compute/row/row_internal.h#L78-L79C25.
It is also indeed unfortunate that the distance between the invariant
checking and the invariant itself is so far and we can't make the connection of
them more obvious. Maybe adding a comment could be slightly helpful?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]