etseidl commented on code in PR #8071:
URL: https://github.com/apache/arrow-rs/pull/8071#discussion_r2261631007
##########
parquet/src/file/metadata/reader.rs:
##########
@@ -105,7 +126,15 @@ impl FooterTail {
impl ParquetMetaDataReader {
/// Create a new [`ParquetMetaDataReader`]
pub fn new() -> Self {
- Default::default()
+ Self {
Review Comment:
Is this change necessary? Won't the policies default to `Skip` anyway?
##########
parquet/src/file/metadata/reader.rs:
##########
@@ -593,7 +642,15 @@ impl ParquetMetaDataReader {
col_idx,
)
}
- None => Err(general_err!("missing offset index")),
+ None => {
Review Comment:
This change here is the heart of the PR. The thing that gives me pause is
simply replacing missing indexes with empty vectors doesn't let the user know
that the indexes are in a potentially unusable state. Are the indexes missing
for a single column chunk? An entire row group? We can't really tell without
doing a validation step after decoding is complete.
I think if we move forward with this, I'd prefer rather than inserting
invalid indexes, we instead invalidate the entire page index (i.e. set column
and offset index back to `None` in the `ParquetMetaData`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]