pitrou commented on code in PR #242:
URL: https://github.com/apache/parquet-format/pull/242#discussion_r1603218899


##########
README.md:
##########
@@ -113,6 +119,55 @@ chunks they are interested in.  The columns chunks should 
then be read sequentia
 
  ![File 
Layout](https://raw.github.com/apache/parquet-format/master/doc/images/FileLayout.gif)
 
+### Parquet 3
+
+Parquet 3 files have the following overall structure:
+
+```
+4-byte magic number "PAR1"
+4-byte magic number "PAR3"
+8-byte offset of File Metadata v3
+8-byte length of File Metadata v3
+
+<Column 1 Chunk 1 + Column Metadata>
+<Column 2 Chunk 1 + Column Metadata>
+...
+<Column N Chunk 1 + Column Metadata>
+<Column 1 Chunk 2 + Column Metadata>
+<Column 2 Chunk 2 + Column Metadata>
+...
+<Column N Chunk 2 + Column Metadata>
+...
+<Column 1 Chunk M + Column Metadata>
+<Column 2 Chunk M + Column Metadata>
+...
+<Column N Chunk M + Column Metadata>
+
+<File-level Column 1 Metadata v3>
+...
+<File-level Column N Metadata v3>
+File Metadata v3
+
+File Metadata
+4-byte length in bytes of file metadata (little endian)
+4-byte magic number "PAR1"
+```
+
+The File Metadata v3 is designed to be light-weight to decode, regardless of
+the number of columns in the file. Individual column metadata can be 
opportunistically
+decoded depending on actual needs.
+
+This file structure is backwards-compatible. Parquet 1 readers will read the
+legacy File Metadata in the file footer, while Parquet 3 readers will notice
+the "PAR1PAR3" magic number (probably by reading the 24 first bytes in the 
file)
+and will instead read the File Metadata v3.

Review Comment:
   Quoting the structure above:
   ```
   4-byte magic number "PAR1"
   4-byte magic number "PAR3"
   8-byte offset of File Metadata v3
   8-byte length of File Metadata v3
   ```
   
   Does it answer your question?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to