pitrou commented on code in PR #242:
URL: https://github.com/apache/parquet-format/pull/242#discussion_r1603917314
##########
src/main/thrift/parquet.thrift:
##########
@@ -1165,6 +1317,62 @@ struct FileMetaData {
9: optional binary footer_signing_key_metadata
}
+/** Metadata for a column in this file. */
+struct FileColumnMetadataV3 {
+ /** All column chunks in this file (one per row group) **/
+ 1: required list<ColumnChunkV3> columns
+
+ /** Sort order used for the Statistics min_value and max_value fields
+ **/
+ 2: optional ColumnOrder column_order;
+
+ /** NEW from v1: Optional key/value metadata for this column at the file
level
+ **/
+ 3: optional list<KeyValue> key_value_metadata
Review Comment:
This is file-level metadata. Do we have reasons to believe there will be
enough key-values to warrant the complexity of using dedicated data pages for
this?
The more we deviate from Parquet 1 file organization, the more work it will
create for implementors and the more potential for incompatibilites and bugs.
We should perhaps ask on the ML for opinions...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]