mapleFU commented on code in PR #37127:
URL: https://github.com/apache/arrow/pull/37127#discussion_r1292370848
##########
cpp/src/parquet/encoding.cc:
##########
@@ -1171,11 +1171,9 @@ int PlainBooleanDecoder::Decode(uint8_t* buffer, int
max_values) {
int PlainBooleanDecoder::Decode(bool* buffer, int max_values) {
max_values = std::min(max_values, num_values_);
- if (bit_reader_->GetBatch(1, buffer, max_values) != max_values) {
- ParquetException::EofException();
- }
- num_values_ -= max_values;
- return max_values;
+ int num_values_read = bit_reader_->GetBatch(1, buffer, max_values);
Review Comment:
Ooops, the `Decode` function has declaration:
```c++
/// \brief Decode values into a buffer
///
/// Subclasses may override the more specialized Decode methods below.
///
/// \param[in] buffer destination for decoded values
/// \param[in] max_values maximum number of values to decode
/// \return The number of values decoded. Should be identical to
max_values except
/// at the end of the current data page.
virtual int Decode(T* buffer, int max_values) = 0;
```
Would the `max_values` beyond the real existing value?
And it's a bit hard to mock a testing here, because when we have the data
below:
```
Page: 10000 values
NullValues: 1
non-null: 9999
```
The num_bytes should be `1250`, and the decoder would not know the trailing
bit is a padding bit, so it could return `10000` values. However, other plain
types can find that the bytes are already consumed.
Actually, it would require outsize to provide the right value.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]