mapleFU opened a new issue, #38880:
URL: https://github.com/apache/arrow/issues/38880

   ### Describe the enhancement requested
   
   `parquet::ColumnReader::HasNextInternal` might call `ReadNewPage` to check 
the record boundary.
   
   ```c++
     bool HasNextInternal() {
       // Either there is no data page available yet, or the data page has been
       // exhausted
       if (num_buffered_values_ == 0 || num_decoded_values_ == 
num_buffered_values_) {
         if (!ReadNewPage() || num_buffered_values_ == 0) {
           return false;
         }
       }
       return true;
     }
   ```
   
   And `ReadNewPage` will call:
   
   ```c++
     // Advance to the next data page
     bool ReadNewPage() {
       // Loop until we find the next data page.
       while (true) {
         current_page_ = pager_->NextPage();
         if (!current_page_) {
           // EOS
           return false;
         }
   ```
   
   When having `data_page_filter`, in v1 format, seems that `NextPage` might 
filter the data-page?
   
   ### Component(s)
   
   C++, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to