Patzifist opened a new issue, #814:
URL: https://github.com/apache/arrow-go/issues/814
### Describe the usage question you have. Please include as many useful
details as possible.
Hi team,
I’m currently configuring parquet-go (v18) for high-performance data
ingestion and I have a question regarding the utility of the Page Index (Column
Index / Offset Index).
In my current setup, I see that PageIndexEnabled can be toggled in
WriterProperties. However, after digging into the arrow-go reader and scanner
implementations, I couldn't find clear evidence that the Page Index is being
used to perform page-level skipping during queries.
Questions:
1. Read-side support: Does the current arrow-go Parquet reader or the
higher-level Scanner API actually implement page-level pruning using the Page
Index? Or is filtering still limited to Row Group boundaries?
2. Writing strategy: If the Go reader doesn't support it yet, is there any
reason to enable it during the write phase other than compatibility with
external engines (like Spark or Trino)?
3. Overhead: Are there any significant performance penalties when writing
files with Page Index enabled in a Go-centric environment, given the extra
metadata management?
I want to avoid including "dead weight" metadata in my files if it doesn't
provide any performance benefits within the Go ecosystem.
Looking forward to your clarification.
### Component(s)
Parquet
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]