Patzifist opened a new issue, #814:
URL: https://github.com/apache/arrow-go/issues/814

   ### Describe the usage question you have. Please include as many useful 
details as possible.
   
   
   Hi team,
   
   I’m currently configuring parquet-go (v18) for high-performance data 
ingestion and I have a question regarding the utility of the Page Index (Column 
Index / Offset Index).
   
   In my current setup, I see that PageIndexEnabled can be toggled in 
WriterProperties. However, after digging into the arrow-go reader and scanner 
implementations, I couldn't find clear evidence that the Page Index is being 
used to perform page-level skipping during queries.
   
   Questions:
   
   1. Read-side support: Does the current arrow-go Parquet reader or the 
higher-level Scanner API actually implement page-level pruning using the Page 
Index? Or is filtering still limited to Row Group boundaries?
   2. Writing strategy: If the Go reader doesn't support it yet, is there any 
reason to enable it during the write phase other than compatibility with 
external engines (like Spark or Trino)?
   3. Overhead: Are there any significant performance penalties when writing 
files with Page Index enabled in a Go-centric environment, given the extra 
metadata management?
    
   I want to avoid including "dead weight" metadata in my files if it doesn't 
provide any performance benefits within the Go ecosystem.
   
   Looking forward to your clarification.
   
   ### Component(s)
   
   Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to