emkornfield commented on code in PR #250:
URL: https://github.com/apache/parquet-format/pull/250#discussion_r1621094106
##########
src/main/thrift/parquet.thrift:
##########
@@ -1127,18 +1229,48 @@ struct FileMetaData {
* are flattened to a list by doing a depth-first traversal.
* The column metadata contains the path in the schema for that column which
can be
* used to map columns to nodes in the schema.
- * The first element is the root **/
- 2: required list<SchemaElement> schema;
+ * The first element is the root
+ *
+ * PAR1: Required
+ * PAR3: Use schema_page
+ **/
+ 2: optional list<SchemaElement> schema;
+
+ /** Page has BYTE_ARRAY data where each element is REQUIRED.
+ *
+ * Each element is a serialized SchemaElement. The order and content should
+ * have a one to one correspondence with schema.
+ */
+ 10: optional MetadataPage schema_page;
Review Comment:
My intent was to introduce an encoding that allows zero-copy random access
which I think would be better then list<binary> which I would guess might be
slightly better. Plain encoding is effectively equivelant to list<binary> on
the wire I believe, and this way we avoid the up front cost of decoding the
list.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]