Re: [PR] GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED [arrow]

via GitHub Tue, 03 Oct 2023 07:33:37 -0700


pitrou commented on code in PR #37940:
URL: https://github.com/apache/arrow/pull/37940#discussion_r1344210900



##########
cpp/src/parquet/encoding_test.cc:
##########
@@ -1634,6 +1634,41 @@ TYPED_TEST(TestDeltaBitPackEncoding, 
NonZeroPaddedMiniblockBitWidth) {
   }
 }
 
+TYPED_TEST(TestDeltaBitPackEncoding, DeltaBitPackedWrapping) {
+  using T = typename TypeParam::c_type;

Review Comment:
   Can you add a comment refering to the GH issue?



##########
cpp/src/parquet/encoding_test.cc:
##########
@@ -1634,6 +1634,41 @@ TYPED_TEST(TestDeltaBitPackEncoding, 
NonZeroPaddedMiniblockBitWidth) {
   }
 }
 
+TYPED_TEST(TestDeltaBitPackEncoding, DeltaBitPackedWrapping) {
+  using T = typename TypeParam::c_type;
+
+  // Values that should wrap when converted to deltas, and then when converted 
to the
+  // frame of reference.
+  std::vector<T> int_values = {std::numeric_limits<T>::min(),
+                               std::numeric_limits<T>::max(),
+                               std::numeric_limits<T>::min(),
+                               std::numeric_limits<T>::max(),
+                               0,
+                               -1,
+                               0,
+                               1,
+                               -1,
+                               1};
+  const int num_values = static_cast<int>(int_values.size());
+
+  auto const encoder = MakeTypedEncoder<TypeParam>(
+      Encoding::DELTA_BINARY_PACKED, /*use_dictionary=*/false, 
this->descr_.get());
+  encoder->Put(int_values, num_values);
+  auto const encoded = encoder->FlushValues();
+
+  auto const decoder =
+      MakeTypedDecoder<TypeParam>(Encoding::DELTA_BINARY_PACKED, 
this->descr_.get());
+
+  std::vector<T> decoded(num_values);
+  decoder->SetData(num_values, encoded->data(), 
static_cast<int>(encoded->size()));
+
+  const int values_decoded = decoder->Decode(decoded.data(), num_values);
+
+  ASSERT_EQ(num_values, values_decoded);
+  ASSERT_NO_FATAL_FAILURE(
+      VerifyResults<T>(decoded.data(), int_values.data(), num_values));
+}

Review Comment:
   Since this PR also fixes the encoded data size, can you add a test for that? 
For example check that the encoded buffer size is equal to a certain value 
given data that would have triggered the bug.



##########
cpp/src/parquet/encoding.cc:
##########
@@ -2250,17 +2253,17 @@ void DeltaBitPackEncoder<DType>::FlushBlock() {
         std::min(values_per_mini_block_, values_current_block_);
 
     const uint32_t start = i * values_per_mini_block_;
-    const UT max_delta = *std::max_element(
+    const T max_delta = *std::max_element(
         deltas_.begin() + start, deltas_.begin() + start + 
values_current_mini_block);
 
     // The minimum number of bits required to write any of values in deltas_ 
vector.
     // See overflow comment above.
-    const auto bit_width = bit_width_data[i] =
-        bit_util::NumRequiredBits(max_delta - min_delta);
+    const auto bit_width = bit_width_data[i] = bit_util::NumRequiredBits(
+        static_cast<UT>(SafeSignedSubtract(max_delta, min_delta)));

Review Comment:
   Note that `SafeSignedSubtract` simply does the substraction in the unsigned 
domain, so you could also write `static_cast<UT>(max_delta) - 
static_cast<UT>(min_delta)`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED [arrow]

Reply via email to