mapleFU opened a new issue, #38880:
URL: https://github.com/apache/arrow/issues/38880
### Describe the enhancement requested
`parquet::ColumnReader::HasNextInternal` might call `ReadNewPage` to check
the record boundary.
```c++
bool HasNextInternal() {
// Either there is no data page available yet, or the data page has been
// exhausted
if (num_buffered_values_ == 0 || num_decoded_values_ ==
num_buffered_values_) {
if (!ReadNewPage() || num_buffered_values_ == 0) {
return false;
}
}
return true;
}
```
And `ReadNewPage` will call:
```c++
// Advance to the next data page
bool ReadNewPage() {
// Loop until we find the next data page.
while (true) {
current_page_ = pager_->NextPage();
if (!current_page_) {
// EOS
return false;
}
```
When having `data_page_filter`, in v1 format, seems that `NextPage` might
filter the data-page?
### Component(s)
C++, Parquet
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]