JFinis commented on code in PR #242:
URL: https://github.com/apache/parquet-format/pull/242#discussion_r1629354120
##########
src/main/thrift/parquet.thrift:
##########
@@ -885,6 +971,44 @@ struct ColumnChunk {
9: optional binary encrypted_column_metadata
}
+struct ColumnChunkV3 {
+ /** File where column data is stored. **/
+ 1: optional string file_path
Review Comment:
@adamreeve Interesting, you're the first person I have ever heard who would
use this external file feature.
However, the field does not seem to fully match what you are describing. The
`file_path` field defines where the *data* itself lives, not the metadata. Or
are you saying, that the "main" parquet file is your `_metadata` file and then
the actual data lives in other files?
Can you elaborate what advantage you see here? I can't grasp how putting
data into a different file would help with network file systems. Our system
(and probably the majority on the industry) runs on cloud object stores and
there having data in separate files wouldn't help at all. So how does it help
in your use case? How does it reduce the number of system operations?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]