etseidl commented on PR #6582:
URL: https://github.com/apache/arrow-rs/pull/6582#issuecomment-2422652824

   Yes, as @jroddev found, at least part of this issue is tracked by #6447. The 
original reader for the offset index returns an empty vec if offset indexes are 
requested but are not actually present in the file.
   
https://github.com/apache/arrow-rs/blob/dd5a2294b8b28f768b991e0e89fe7686b296c4ec/parquet/src/file/page_index/index_reader.rs#L135
 
   There is a test somewhere that actually expects this behavior (I'll search 
later for that). The async `MetadataLoader` instead leaves the offset index as 
`None` in that case.
   
https://github.com/apache/arrow-rs/blob/dd5a2294b8b28f768b991e0e89fe7686b296c4ec/parquet/src/arrow/async_reader/metadata.rs#L174
   @alamb and I felt we couldn't reconcile the two until 54.0.0.
   
   As to `ParquetMetadataWriter`, I'm honestly not sure what happened in the 
past when the offset index was `Some([])`, so I'll do some digging there. It's 
possible there was a behavior change there.
   
   I'll have more time this afternoon to dig into this and look over this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to