emkornfield commented on code in PR #242:
URL: https://github.com/apache/parquet-format/pull/242#discussion_r1604103405
##########
src/main/thrift/parquet.thrift:
##########
@@ -1165,6 +1317,62 @@ struct FileMetaData {
9: optional binary footer_signing_key_metadata
}
+/** Metadata for a column in this file. */
+struct FileColumnMetadataV3 {
+ /** All column chunks in this file (one per row group) **/
+ 1: required list<ColumnChunkV3> columns
Review Comment:
A few other throughts here:
1. Once we start having different messages here in a lot of languages the
extra development effort to be able to read a random access page and extra a
byte-array probably isn't too high compared to having completely new types to
parse and translate (i.e. there is going to be a fair bit of boiler plate).
2. By treating list as a black-box of bytes, it leaves open the possibility
of changing encodings for the elements if we have data showing that is a large
improvement.
3. Instead of being an offset, I suppose this could just be modeled in the
message as [bytes], the main downside to this is thrift bindings that don't
allow zero copy.
4. I think if we wanted to make implementors lifes easier it might be a
smaller change keep all of the existing metadata structure except for the
FileMetadata in there current version and specify when new vs old fields are
populated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]