adamreeve commented on code in PR #242:
URL: https://github.com/apache/parquet-format/pull/242#discussion_r1629398130
##########
src/main/thrift/parquet.thrift:
##########
@@ -885,6 +971,44 @@ struct ColumnChunk {
9: optional binary encrypted_column_metadata
}
+struct ColumnChunkV3 {
+ /** File where column data is stored. **/
+ 1: optional string file_path
Review Comment:
I think Audrius has answered your question @JFinis (Audrius and I are work
colleagues), I just want to add that I don't think what we're doing is
particularly unusual as this is a feature of the Arrow Dataset library, so
presumably there are other users of this API (see
[pyarrow.dataset.parquet_dataset](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.parquet_dataset.html)).
And the benefits of using this `_metadata` index file should translate to
cloud object stores too by reducing the number of objects/files to be read.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]