sundy-li commented on PR #4299:
URL: https://github.com/apache/arrow-rs/pull/4299#issuecomment-1567035679
Yes, `AsyncFileReader`'s `get_metadata` can work.
But:
1. We don't store the whole metadata of the parquet files, we just store the
`Vec<ColumnChunkMetaData>` of each leaf column, because we only write one row
group, so it's much simple and small metadata.
2. The `ParquetRecordBatchStream` will be reading IO task in dedicated async
runtime and decoding in blocking threads. But we have completely separated the
two processes, we will first fetch the Bytes in dedicated async runtime and
send the results to a thread pool to decode them into arrrays.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]