emkornfield commented on code in PR #544:
URL: https://github.com/apache/parquet-format/pull/544#discussion_r2677177748


##########
src/main/flatbuf/parquet3.fbs:
##########
@@ -0,0 +1,224 @@
+namespace parquet.format3;
+
+// Optimization notes
+// 1. Statistics are stored in integral types if their size is fixed, 
otherwise prefix + suffix
+// 2. ColumnMetaData.encoding_stats are removed, they are replaced with
+//    ColumnMetaData.is_fully_dict_encoded.
+// 3. RowGroups are limited to 2GB in size, so we can use int for sizes.
+// 4. ColumnChunk/ColumnMetaData offsets are now relative to the start of the 
row group, so we can
+//    use int for offsets.
+// 5. Remove ordinal.
+// 6. Restrict RowGroups to 2^31-1 rows.
+// 7. Remove offset/column indexes, they are small and just their offsets are 
of similar size.
+
+///////////////////////////////////////////////////////////////////////////////////////////////////
+// Physical types.
+///////////////////////////////////////////////////////////////////////////////////////////////////
+
+enum Type : byte {
+  BOOLEAN = 0,
+  INT32 = 1,
+  INT64 = 2,
+  INT96 = 3,
+  FLOAT = 4,
+  DOUBLE = 5,
+  BYTE_ARRAY = 6,
+  FIXED_LEN_BYTE_ARRAY = 7,
+}
+
+enum FieldRepetitionType : byte {
+  REQUIRED = 0,
+  OPTIONAL = 1,
+  REPEATED = 2,
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////////
+// Encodings.
+///////////////////////////////////////////////////////////////////////////////////////////////////
+
+// Note: Match the thrift enum values so that we can cast between them.
+enum Encoding : byte {
+  PLAIN = 0,
+  // GROUP_VAR_INT = 1,
+  PLAIN_DICTIONARY = 2,
+  RLE = 3,
+  // BIT_PACKED = 4,
+  DELTA_BINARY_PACKED = 5,
+  DELTA_LENGTH_BYTE_ARRAY = 6,
+  DELTA_BYTE_ARRAY = 7,
+  RLE_DICTIONARY = 8,
+  BYTE_STREAM_SPLIT = 9,
+}
+
+// Note: Match the thrift enum values so that we can cast between them.
+enum CompressionCodec : byte {
+  UNCOMPRESSED = 0,
+  SNAPPY = 1,
+  GZIP = 2,
+  LZO = 3,
+  BROTLI = 4,
+  // LZ4 = 5,
+  ZSTD = 6,
+  LZ4_RAW = 7,
+}
+
+///////////////////////////////////////////////////////////////////////////////////////////////////
+// Logical types.
+///////////////////////////////////////////////////////////////////////////////////////////////////
+
+table Empty {}

Review Comment:
   I think we want detailed docs (same level as parquet.thrift if we intend 
this to be the new footer)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to