steveloughran commented on PR #1139: URL: https://github.com/apache/parquet-mr/pull/1139#issuecomment-1724053099
@danielcweeks that's a good point about pluggability. 1. an interface/implementation split in parquet would line you up later to choose the back end, maybe? 2. I've done an initial pass at an shim library to use vectored IO operations if a stream/hadoop version had it, but fall back to usual blocking reads if not (along with the same for everything else). but just getting the base vector io stuff into parquet is a lot simpler. I don't know if that would be useful for iceberg https://github.com/apache/hadoop-api-shim 3. video on the whole topic getting iceberg to pass down which stripes it wants to read is critical for this to work best with s3, abfs and gcs. how are you reading the files at present? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org