AudriusButkevicius commented on code in PR #242:
URL: https://github.com/apache/parquet-format/pull/242#discussion_r1629368632
##########
src/main/thrift/parquet.thrift:
##########
@@ -885,6 +971,44 @@ struct ColumnChunk {
9: optional binary encrypted_column_metadata
}
+struct ColumnChunkV3 {
+ /** File where column data is stored. **/
+ 1: optional string file_path
Review Comment:
The _metadata file has all of the row group details/stats. If I have a
dataset that I need to read and filter to a given date (where date is not part
of the filesystem partitioning scheme), I can use _metadata file to filter down
the row groups, see which files those row groups belong to and only read those
files. Otherwise I'd have to open every file, read it's row group stats, decide
the file doesn't have the date I'm after, close file, move to the next file,
repeat.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]