Re: [PR] DRAFT: Parquet 3 metadata with decoupled column metadata [parquet-format]

via GitHub Thu, 06 Jun 2024 04:37:00 -0700


JFinis commented on code in PR #242:
URL: https://github.com/apache/parquet-format/pull/242#discussion_r1629354120



##########
src/main/thrift/parquet.thrift:
##########
@@ -885,6 +971,44 @@ struct ColumnChunk {
   9: optional binary encrypted_column_metadata
 }
 
+struct ColumnChunkV3 {
+  /** File where column data is stored. **/
+  1: optional string file_path

Review Comment:
   @adamreeve Interesting, you're the first person I have ever heard who would 
use this external file feature. 
   
   However, the field does not seem to fully match what you are describing. The 
`file_path` field defines where the *data* itself lives, not the metadata. Or 
are you saying, that the "main" parquet file is your `_metadata` file and then 
the actual data lives in other files?
   
   Can you elaborate what advantage you see here? I can't grasp how putting 
data into a different file would help with network file systems. Our system 
(and probably the majority on the industry) runs on cloud object stores and 
there having data in separate files wouldn't help at all. So how does it help 
in your use case? How does it reduce the number of system operations?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] DRAFT: Parquet 3 metadata with decoupled column metadata [parquet-format]

Reply via email to