Re: [PR] DRAFT: Alternative Strawman proposal for a new V3 footer format in Parquet [parquet-format]

via GitHub Wed, 12 Jun 2024 20:00:44 -0700


mhaseeb123 commented on code in PR #250:
URL: https://github.com/apache/parquet-format/pull/250#discussion_r1637433667



##########
src/main/thrift/parquet.thrift:
##########
@@ -792,12 +806,23 @@ struct ColumnMetaData {
   6: required i64 total_uncompressed_size
 
   /** total byte size of all compressed, and potentially encrypted, pages 
-   *  in this column chunk (including the headers) **/
+   *  in this column chunk (including the headers) 
+   * 
+   *  Fetching the range of min(dictionary_page_offset, data_page_offset) 
+   *  + total_compressed_size should fetch all data in the the given column 
+   * chunk.
+   */
   7: required i64 total_compressed_size
 
-  /** Optional key/value metadata **/
+  /** Optional key/value metadata 
+    * PAR1: Optional. 
+    * PAR3: Don't write use key_value_metadata instead.
+    **/
   8: optional list<KeyValue> key_value_metadata
 
+  /** See description on FileMetata.key_value_metadata **/
+  19: optional MetadataPage key_value_metadata_page
+
   /** Byte offset from beginning of file to first data page **/
   9: required i64 data_page_offset

Review Comment:
   > This is almost always the same as the start of the columnchunk. We should 
make this optional and imply it is the same as `ColumnChunk.file_offset` if 
missing.
   
   Makes sense
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] DRAFT: Alternative Strawman proposal for a new V3 footer format in Parquet [parquet-format]

Reply via email to