mapleFU commented on code in PR #242:
URL: https://github.com/apache/parquet-format/pull/242#discussion_r1603175571


##########
src/main/thrift/parquet.thrift:
##########
@@ -885,6 +971,44 @@ struct ColumnChunk {
   9: optional binary encrypted_column_metadata
 }
 
+struct ColumnChunkV3 {
+  /** File where column data is stored. **/
+  1: optional string file_path
+
+  /** Byte offset in file_path to the ColumnChunkMetaDataV3, optionally 
encrypted
+   ** CHANGED from v1: renamed to metadata_file_offset
+   **/
+  2: required i64 metadata_file_offset
+
+  /** NEW from v1: Byte length in file_path of ColumnChunkMetaDataV3, 
optionally encrypted
+   **/
+  3: required i64 metadata_file_length
+
+  /** REMOVED from v1: meta_data, encrypted_column_metadata.
+   ** Use encoded_metadata instead.
+   **/
+
+  /** NEW from v1: Column metadata for this chunk, duplicated here from 
file_path.
+   ** This is a Thrift-encoded ColumnChunkMetaDataV3, optionally encrypted.
+   **/
+  4: optional binary encoded_metadata

Review Comment:
   So the core different here is it use "binary" here, which allowing "lazy 
decoding" it?



##########
README.md:
##########
@@ -113,6 +119,55 @@ chunks they are interested in.  The columns chunks should 
then be read sequentia
 
  ![File 
Layout](https://raw.github.com/apache/parquet-format/master/doc/images/FileLayout.gif)
 
+### Parquet 3
+
+Parquet 3 files have the following overall structure:
+
+```
+4-byte magic number "PAR1"
+4-byte magic number "PAR3"
+8-byte offset of File Metadata v3
+8-byte length of File Metadata v3
+
+<Column 1 Chunk 1 + Column Metadata>
+<Column 2 Chunk 1 + Column Metadata>
+...
+<Column N Chunk 1 + Column Metadata>
+<Column 1 Chunk 2 + Column Metadata>
+<Column 2 Chunk 2 + Column Metadata>
+...
+<Column N Chunk 2 + Column Metadata>
+...
+<Column 1 Chunk M + Column Metadata>
+<Column 2 Chunk M + Column Metadata>
+...
+<Column N Chunk M + Column Metadata>
+
+<File-level Column 1 Metadata v3>
+...
+<File-level Column N Metadata v3>

Review Comment:
   Aha, So the concept of "row-group" is removed now? Does the chunk means 
"group" or "page"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to