mapleFU commented on code in PR #15124:
URL: https://github.com/apache/arrow/pull/15124#discussion_r1060511173


##########
cpp/src/parquet/encoding.cc:
##########
@@ -2479,7 +2481,18 @@ class DeltaBitPackDecoder : public DecoderImpl, virtual 
public TypedDecoder<DTyp
       if (ARROW_PREDICT_FALSE(values_current_mini_block_ == 0)) {
         if (ARROW_PREDICT_FALSE(!block_initialized_)) {
           buffer[i++] = last_value_;
-          if (ARROW_PREDICT_FALSE(i == max_values)) break;
+          if (ARROW_PREDICT_FALSE(i == max_values)) {
+            // When block is uninitialized and i reaches max_values we have two
+            // different possibilities:
+            // 1. i == total_value_count_, which means that the page may have 
only one
+            // value and we should not initialize any block.
+            // 2. i != total_value_count_ which means that user just read the 
first value
+            // in the page, so we should initialize the incoming block.
+            if (i != static_cast<int>(total_value_count_)) {
+              InitBlock();
+            }

Review Comment:
   By the way, should we add `ARROW_PREDICT_FALSE` here? @pitrou 



##########
cpp/src/parquet/encoding_test.cc:
##########
@@ -1324,6 +1324,29 @@ class TestDeltaBitPackEncoding : public 
TestEncodingBase<Type> {
     CheckRoundtripSpaced(valid_bits, valid_bits_offset);
   }
 
+  void ExecuteSteps(int nvalues, int repeats, int read_batch) {

Review Comment:
   It's ok, but most test current not use it. So I'd like to keep it simple 
here. If later we want test `batch size` for other encoding, we can move it to 
`CheckRoundtrip`



##########
cpp/src/parquet/encoding.cc:
##########
@@ -2479,7 +2481,18 @@ class DeltaBitPackDecoder : public DecoderImpl, virtual 
public TypedDecoder<DTyp
       if (ARROW_PREDICT_FALSE(values_current_mini_block_ == 0)) {
         if (ARROW_PREDICT_FALSE(!block_initialized_)) {
           buffer[i++] = last_value_;
-          if (ARROW_PREDICT_FALSE(i == max_values)) break;
+          if (ARROW_PREDICT_FALSE(i == max_values)) {

Review Comment:
   Nice catch, would add it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to