metsw24-max commented on code in PR #50089:
URL: https://github.com/apache/arrow/pull/50089#discussion_r3403357056
##########
cpp/src/arrow/util/rle_encoding_internal.h:
##########
@@ -371,15 +371,15 @@ class BitPackedRunDecoder {
/// left.
[[nodiscard]] rle_size_t GetBatch(value_type* out, rle_size_t batch_size,
rle_size_t value_bit_width) {
- const int bits_read = values_read_ * value_bit_width;
- const int bytes_fully_read = bits_read / 8;
+ const int64_t bits_read = static_cast<int64_t>(values_read_) *
value_bit_width;
+ const int64_t bytes_fully_read = bits_read / 8;
const uint8_t* unread_data = data_ + bytes_fully_read;
const ::arrow::internal::UnpackOptions opts{
/* .batch_size= */ std::min(batch_size, remaining()),
/* .bit_width= */ value_bit_width,
- /* .bit_offset= */ bits_read % 8,
- /* .max_read_bytes= */ max_read_bytes_ - bytes_fully_read,
+ /* .bit_offset= */ static_cast<int>(bits_read % 8),
+ /* .max_read_bytes= */ static_cast<int>(max_read_bytes_ -
bytes_fully_read),
Review Comment:
For parser-produced runs it does hold: `PeekImpl` only emits a run whose
whole payload fits in the remaining buffer (it truncates or rejects the run
otherwise), and the `BitPackedRun` constructor DCHECKs that invariant. Since
`values_read_ <= values_count_`, `bytes_fully_read <= max_read_bytes_`, so the
difference stays in `[0, max_read_bytes_]`, which is itself an `rle_size_t`.
The remaining case is the negative sentinel (no bound), where the difference
stays negative and `unpack` treats any negative value the same as -1. Added a
DCHECK at the subtraction site to make that explicit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]