wgtmac commented on code in PR #242:
URL: https://github.com/apache/parquet-format/pull/242#discussion_r1608557867
##########
src/main/thrift/parquet.thrift:
##########
@@ -835,6 +864,65 @@ struct ColumnMetaData {
16: optional SizeStatistics size_statistics;
}
+struct ColumnChunkMetaDataV3 {
+ /** REMOVED from v1: type (redundant with SchemaElementV3) */
+ /** REMOVED from v1: encodings (unnecessary and non-trivial to get right) */
+ /** REMOVED from v1: path_in_schema (unnecessary and wasteful) */
+ /** REMOVED from v1: index_page_offset (unused in practice?) */
+
+ /** Compression codec **/
+ 1: required CompressionCodec codec
Review Comment:
Enforcing same codec to all row groups will prohibit fast merging row groups
of different parquet files without rewriting chunk data. So I vote for keeping
it as is.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]