wgtmac commented on code in PR #34668:
URL: https://github.com/apache/arrow/pull/34668#discussion_r1144166948
##########
cpp/src/parquet/encoding.cc:
##########
@@ -2708,7 +2708,14 @@ class DeltaLengthByteArrayDecoder : public DecoderImpl,
void SetData(int num_values, const uint8_t* data, int len) override {
num_values_ = num_values;
- if (len == 0) return;
+ if (len == 0) {
+ if (num_values > 0) {
Review Comment:
I did a quick check. `num_values` is the number of values with nulls
considered in a data page.
https://github.com/apache/arrow/blob/main/cpp/src/parquet/column_reader.cc#L922
```cpp
current_decoder_->SetData(static_cast<int>(num_buffered_values_), buffer,
static_cast<int>(data_size));
```
https://github.com/apache/arrow/blob/main/cpp/src/parquet/column_reader.cc#L949
```cpp
// The total number of values stored in the data page. This is the maximum
of
// the number of encoded definition levels or encoded values. For
// non-repeated, required columns, this is equal to the number of encoded
// values. For repeated or optional values, there may be fewer data values
// than levels, and this tells you how many encoded levels there are in
that
// case.
int64_t num_buffered_values_;
```
Therefore, the input `num_values > 0 && len == 0` looks valid to me. Should
we not throw here but decode nothing from the decoder instead?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]