pitrou commented on code in PR #37940:
URL: https://github.com/apache/arrow/pull/37940#discussion_r1344210900
##########
cpp/src/parquet/encoding_test.cc:
##########
@@ -1634,6 +1634,41 @@ TYPED_TEST(TestDeltaBitPackEncoding,
NonZeroPaddedMiniblockBitWidth) {
}
}
+TYPED_TEST(TestDeltaBitPackEncoding, DeltaBitPackedWrapping) {
+ using T = typename TypeParam::c_type;
Review Comment:
Can you add a comment refering to the GH issue?
##########
cpp/src/parquet/encoding_test.cc:
##########
@@ -1634,6 +1634,41 @@ TYPED_TEST(TestDeltaBitPackEncoding,
NonZeroPaddedMiniblockBitWidth) {
}
}
+TYPED_TEST(TestDeltaBitPackEncoding, DeltaBitPackedWrapping) {
+ using T = typename TypeParam::c_type;
+
+ // Values that should wrap when converted to deltas, and then when converted
to the
+ // frame of reference.
+ std::vector<T> int_values = {std::numeric_limits<T>::min(),
+ std::numeric_limits<T>::max(),
+ std::numeric_limits<T>::min(),
+ std::numeric_limits<T>::max(),
+ 0,
+ -1,
+ 0,
+ 1,
+ -1,
+ 1};
+ const int num_values = static_cast<int>(int_values.size());
+
+ auto const encoder = MakeTypedEncoder<TypeParam>(
+ Encoding::DELTA_BINARY_PACKED, /*use_dictionary=*/false,
this->descr_.get());
+ encoder->Put(int_values, num_values);
+ auto const encoded = encoder->FlushValues();
+
+ auto const decoder =
+ MakeTypedDecoder<TypeParam>(Encoding::DELTA_BINARY_PACKED,
this->descr_.get());
+
+ std::vector<T> decoded(num_values);
+ decoder->SetData(num_values, encoded->data(),
static_cast<int>(encoded->size()));
+
+ const int values_decoded = decoder->Decode(decoded.data(), num_values);
+
+ ASSERT_EQ(num_values, values_decoded);
+ ASSERT_NO_FATAL_FAILURE(
+ VerifyResults<T>(decoded.data(), int_values.data(), num_values));
+}
Review Comment:
Since this PR also fixes the encoded data size, can you add a test for that?
For example check that the encoded buffer size is equal to a certain value
given data that would have triggered the bug.
##########
cpp/src/parquet/encoding.cc:
##########
@@ -2250,17 +2253,17 @@ void DeltaBitPackEncoder<DType>::FlushBlock() {
std::min(values_per_mini_block_, values_current_block_);
const uint32_t start = i * values_per_mini_block_;
- const UT max_delta = *std::max_element(
+ const T max_delta = *std::max_element(
deltas_.begin() + start, deltas_.begin() + start +
values_current_mini_block);
// The minimum number of bits required to write any of values in deltas_
vector.
// See overflow comment above.
- const auto bit_width = bit_width_data[i] =
- bit_util::NumRequiredBits(max_delta - min_delta);
+ const auto bit_width = bit_width_data[i] = bit_util::NumRequiredBits(
+ static_cast<UT>(SafeSignedSubtract(max_delta, min_delta)));
Review Comment:
Note that `SafeSignedSubtract` simply does the substraction in the unsigned
domain, so you could also write `static_cast<UT>(max_delta) -
static_cast<UT>(min_delta)`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]