JFinis commented on code in PR #250:
URL: https://github.com/apache/parquet-format/pull/250#discussion_r1626057631
##########
src/main/thrift/parquet.thrift:
##########
@@ -883,13 +928,42 @@ struct ColumnChunk {
/** Encrypted column metadata for this chunk **/
9: optional binary encrypted_column_metadata
+ /**
+ * The column order for this chunk.
+ *
+ * If not set readers should check FileMetadata.column_orders
+ * instead.
+ *
+ * Populated in both PAR1 and PAR3
+ */
+ 10: optional ColumnOrder column_order
+ /** Set to true if all pages in the column chunk are dictionary
+ * encoded
+ */
+ 11: optional bool all_pages_dictionary_encoded
+ /**
+ * The index to the SchemaElement in FileMetadata for this
+ * column.
+ */
+ 12: optional i32 schema_index
Review Comment:
As discussed on the mailing list, how about this alternative design:
How about just turning things around: Instead of having a schema_index in
the ColumnMetadata, we could have a column_metadata_index in the schema. If
that index is missing/-1, then this signifies that the column is empty, so no
metadata will be present for it. With this, we would get the best of both
worlds: We would always have O(1) random I/O even in case of such empty columns
(as we would use the column_metadata_index for the lookup) and we would not
need to store any ColumnMetadata for empty columns.
After given this a second thought, this also makes more sense in general. As
the navigation direction is usually always from schema to metadata (not vice
versa!), the schema should point us to the correct metadata instead of the
metadata pointing us to the correct schema entry.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]