mapleFU commented on code in PR #37940:
URL: https://github.com/apache/arrow/pull/37940#discussion_r1341125855
##########
cpp/src/parquet/encoding.cc:
##########
@@ -2250,17 +2269,17 @@ void DeltaBitPackEncoder<DType>::FlushBlock() {
std::min(values_per_mini_block_, values_current_block_);
const uint32_t start = i * values_per_mini_block_;
- const UT max_delta = *std::max_element(
+ const T max_delta = *std::max_element(
deltas_.begin() + start, deltas_.begin() + start +
values_current_mini_block);
// The minimum number of bits required to write any of values in deltas_
vector.
// See overflow comment above.
- const auto bit_width = bit_width_data[i] =
- bit_util::NumRequiredBits(max_delta - min_delta);
+ const auto bit_width = bit_width_data[i] = bit_util::NumRequiredBits(
+ static_cast<UT>(SafeSignedSubtractSigned(max_delta, min_delta)));
Review Comment:
Actually I'm not fully understand here. If the sequence is `{1, 0, -1, 0, 1,
0, -1, 0, 1}`
Previously, the value glows large because `max_delta - min_delta`. The
`max_delta` should be 1, however, it turns to be a huge unsigned int, and the
`min_delta` is 1. The result is correct, however it waste huge space?
##########
cpp/src/parquet/encoding.cc:
##########
@@ -2212,12 +2229,12 @@ void DeltaBitPackEncoder<DType>::Put(const T* src, int
num_values) {
total_value_count_ += num_values;
while (idx < num_values) {
- UT value = static_cast<UT>(src[idx]);
- // Calculate deltas. The possible overflow is handled by use of unsigned
integers
+ T value = src[idx];
+ // Calculate deltas. The possible overflow is handled by use of wide
integers
// making subtraction operations well-defined and correct even in case of
overflow.
// Encoded integers will wrap back around on decoding.
// See http://en.wikipedia.org/wiki/Modular_arithmetic#Integers_modulo_n
- deltas_[values_current_block_] = value - current_value_;
+ deltas_[values_current_block_] = SafeSignedSubtractSigned(value,
current_value_);
Review Comment:
Would a `static_cast<T>(value - current_value_);` be ok here? Originally,
`value` and `current_value_` is unsigned, doing unsigned op would not require
an wider type?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]