alkis commented on code in PR #250:
URL: https://github.com/apache/parquet-format/pull/250#discussion_r1622427576
##########
src/main/thrift/parquet.thrift:
##########
@@ -792,12 +806,23 @@ struct ColumnMetaData {
6: required i64 total_uncompressed_size
Review Comment:
Can we make `num_values`, `total_uncompressed_size`, `total_compressed_size`
i32s?
This doesn't matter much for Thrift, but if we are happy with such a change,
it makes a difference for other encodings like flatbuffers.
In addition num_values can be optional and if left unset it can inherit
`RowGroup.num_rows`. Most column chunks are dense and we can save repeating the
same value over and over for every column.
##########
src/main/thrift/parquet.thrift:
##########
@@ -792,12 +806,23 @@ struct ColumnMetaData {
6: required i64 total_uncompressed_size
/** total byte size of all compressed, and potentially encrypted, pages
- * in this column chunk (including the headers) **/
+ * in this column chunk (including the headers)
+ *
+ * Fetching the range of min(dictionary_page_offset, data_page_offset)
+ * + total_compressed_size should fetch all data in the the given column
+ * chunk.
+ */
7: required i64 total_compressed_size
- /** Optional key/value metadata **/
+ /** Optional key/value metadata
+ * PAR1: Optional.
+ * PAR3: Don't write use key_value_metadata instead.
+ **/
8: optional list<KeyValue> key_value_metadata
+ /** See description on FileMetata.key_value_metadata **/
+ 19: optional MetadataPage key_value_metadata_page
Review Comment:
This is unlikely to make a huge difference. In case it does, should we
consider this also:
```thrift
struct KeyVals {
1: optional list<string> keys;
2: optional list<binary> vals;
}
```
##########
src/main/thrift/parquet.thrift:
##########
@@ -1115,6 +1189,34 @@ union EncryptionAlgorithm {
2: AesGcmCtrV1 AES_GCM_CTR_V1
}
+/**
+ * Embedded metadata page.
+ *
+ * A metadata page is a data page used to store metadata about
+ * the data stored in the file. This is a key feature of PAR3
+ * footers which allow for deferred decoding of metadata.
+ *
+ * For common use cases the current recommendation is to use a
+ * an encoding that supported random access but implementations may choose
+ * other configuration parameters if necessary. Implementations
+ * SHOULD consider allowing configurability per page to allow for end-users
+ * to optimize size vs compute trade-offs that make sense for their use-case.
+ *
+ * Statistics for Metadata pages SHOULD NOT be written.
+ *
+ * Structs of this type should never be written in PAR1.
+ */
+struct MetadataPage {
+ // A serialized page including metadata thrift header and data.
+ 1: required binary page
+ // Optional compression applied to the page.
+ 2: optional CompressionCodec compression
Review Comment:
I suggest not doing this at all unless benchmarks show it is a large win.
From experience with general purpose compressors (lempel ziv style) they fail
to compress small entities like ints or ulebs. Since that's the majority of
stuff we will put here, I do not expect compression to be profitable.
##########
README.md:
##########
@@ -118,6 +118,51 @@ chunks they are interested in. The columns chunks should
then be read sequentia

+ ### PAR3 File Footers
+
+ PAR3 file footer footer format designed to better support wider-schemas and
more control
+ over the various footer size vs compute trade-offs. Its format is as follows:
+ - Serialized Thrift FileMetadata Structure
+ - (Optional) 4 byte CRC32 of the serialized Thrift FileMetadata.
+ - 4-byte length in bytes (little endian) of all preceding elements in the
footer.
+ - 4-byte little-endian flag field to indicate features that require special
parsing of the footer.
+ Readers MUST raise an error if there is an unrecognized flag. Current
flags:
+
+ * 0x01 - Footer encryption enabled (when set the encryption information
is written before
+ FileMeta structure as in the PAR1 footer).
+ * 0x02 - CRC32 of FileMetadata Footer.
Review Comment:
What are the benefits of feature flags in this place in the file?
We have `FileMetaData.version` which can serve the purpose of feature flags
for everything, except encrypted footer (since we can't otherwise read them).
Put it differently the only feature we cannot encode inside the footer itself
is if the footer is encrypted. For this it seems we can keep using a secondary
magic number forever?
##########
src/main/thrift/parquet.thrift:
##########
@@ -883,13 +928,42 @@ struct ColumnChunk {
/** Encrypted column metadata for this chunk **/
9: optional binary encrypted_column_metadata
+ /**
+ * The column order for this chunk.
+ *
+ * If not set readers should check FileMetadata.column_orders
+ * instead.
+ *
+ * Populated in both PAR1 and PAR3
+ */
+ 10: optional ColumnOrder column_order
Review Comment:
Do we have examples of other orders that might be useful?
Can drop this concept alltogether and make only one ordering available?
##########
src/main/thrift/parquet.thrift:
##########
@@ -792,12 +806,23 @@ struct ColumnMetaData {
6: required i64 total_uncompressed_size
/** total byte size of all compressed, and potentially encrypted, pages
- * in this column chunk (including the headers) **/
+ * in this column chunk (including the headers)
+ *
+ * Fetching the range of min(dictionary_page_offset, data_page_offset)
+ * + total_compressed_size should fetch all data in the the given column
+ * chunk.
+ */
7: required i64 total_compressed_size
- /** Optional key/value metadata **/
+ /** Optional key/value metadata
+ * PAR1: Optional.
+ * PAR3: Don't write use key_value_metadata instead.
+ **/
8: optional list<KeyValue> key_value_metadata
+ /** See description on FileMetata.key_value_metadata **/
+ 19: optional MetadataPage key_value_metadata_page
+
/** Byte offset from beginning of file to first data page **/
9: required i64 data_page_offset
Review Comment:
This is almost always the same as the start of the columnchunk. We should
make this optional and imply it is the same as `ColumnChunk.file_offset` if
missing.
##########
src/main/thrift/parquet.thrift:
##########
@@ -1127,18 +1229,48 @@ struct FileMetaData {
* are flattened to a list by doing a depth-first traversal.
* The column metadata contains the path in the schema for that column which
can be
* used to map columns to nodes in the schema.
- * The first element is the root **/
- 2: required list<SchemaElement> schema;
+ * The first element is the root
+ *
+ * PAR1: Required
+ * PAR3: Use schema_page
+ **/
+ 2: optional list<SchemaElement> schema;
+
+ /** Page has BYTE_ARRAY data where each element is REQUIRED.
+ *
+ * Each element is a serialized SchemaElement. The order and content should
+ * have a one to one correspondence with schema.
+ */
+ 10: optional MetadataPage schema_page;
Review Comment:
> My intent was to introduce an encoding that allows zero-copy random access
which I think would be better then list which I would guess might be slightly
better.
Benchmarks can guide this. From my experiments decoding `list<binary>` is so
much faster than `list<MyStruct>` that it doesn't show in profiles anymore.
Plus this field is O(columns). The ones we really have to optimize are the
column chunks that are O(columns * row groups).
##########
src/main/thrift/parquet.thrift:
##########
@@ -812,8 +837,20 @@ struct ColumnMetaData {
/** Set of all encodings used for pages in this column chunk.
* This information can be used to determine if all data pages are
- * dictionary encoded for example **/
+ * dictionary encoded for example
+ *
+ * PAR1: Optional. May be deprecated in a future release in favor
+ * serialized_encoding_stats.
+ * PAR3: Don't populate. Write serialized_page_encoding_stats.
+ **/
13: optional list<PageEncodingStats> encoding_stats;
+ /**
+ * Serialized page encoding stats.
+ *
+ * PAR1: Start populating after encoding_stats is deprecated.
+ * PAR3: Populate instead of encoding_stats.
+ */
+ 17: optional binary serialized_encoding_stats
Review Comment:
Can we drop `encodings`? The purpose is to detect if a reader can read a
column or not. What's going to happen if we don't have that information? Reader
will try to read and maybe will fail the query mid way. Isn't the latter an
acceptable behavior as well?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]