zanmato1984 commented on code in PR #43389:
URL: https://github.com/apache/arrow/pull/43389#discussion_r1709012294


##########
cpp/src/arrow/acero/swiss_join.cc:
##########
@@ -473,17 +473,10 @@ Status RowArrayMerge::PrepareForMerge(RowArray* target,
     (*first_target_row_id)[sources.size()] = num_rows;
   }
 
-  if (num_bytes > std::numeric_limits<uint32_t>::max()) {
-    return Status::Invalid(
-        "There are more than 2^32 bytes of key data.  Acero cannot "
-        "process a join of this magnitude");
-  }
-
   // Allocate target memory
   //
   target->rows_.Clean();
-  RETURN_NOT_OK(target->rows_.AppendEmpty(static_cast<uint32_t>(num_rows),
-                                          static_cast<uint32_t>(num_bytes)));
+  RETURN_NOT_OK(target->rows_.AppendEmpty(static_cast<uint32_t>(num_rows), 
num_bytes));

Review Comment:
   Yes, at least the signature of `AppendEmpty` still requires an `uint32_t` 
`num_rows` argument.
   
   But as you may have already expected, we can actually relax this argument to 
`int64_t` instead. The reason I didn't do it in this PR is that, there exist 
multiple places that are assuming an `uint32_t` number of rows, and I plan to 
clean them up all at once in a future PR. Besides the restriction of using 
`uint32_t` as number of rows is not as impactful as the `uint32_t` total bytes 
(and the subsequent `uint32_t` row offsets) so it's not that hurry.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to