alamb commented on issue #6002: URL: https://github.com/apache/arrow-rs/issues/6002#issuecomment-2226856342
> I got my thing working, but it seems quite brittle. TLDR is that I'm just tracking what bytes DataFusion reads and then slicing to those. Which seems like it could be quite inefficient and might break if DataFusion changes internal details. Good to hear you got it working. Yes I agree getting a more flexible API worked out that is more efficient would be ideal As I think you are hinting at, `MetadataLoader` was designed for whatever the exact needs of the parquet reader were, so is not easy to use outside. Maybe a good place to start would be to write tests / examples of what you are trying to do. For example: 1. Read and decode metadata from a parquet footer * with/without offset index; * with/without bloom filters * when the initial pre-fetch didn't include the bytes for the FileMetadata * When the intiial pre-fetch didn't include the bytes for some of the out of line structures (offset index, bloom filters) Also are you trying to support when you have bytes in memory that you want to decode parquet metadata from? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
