This is an automated email from the ASF dual-hosted git repository.

fokko pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git


The following commit(s) were added to refs/heads/master by this push:
     new dff0b3e  GH-459: Add Variant logical type annotation (#460)
dff0b3e is described below

commit dff0b3e6f02ed28e6b0753e921d53de53e63f506
Author: Gene Pang <[email protected]>
AuthorDate: Wed Nov 6 09:59:24 2024 -0800

    GH-459: Add Variant logical type annotation (#460)
    
    * Add Variant as logical type
    
    * Update shredding to use
    
    * Revert changes to the spec
    
    * Update logical types Variant details
    
    * Clarify BYTE_ARRAY and remove underscore fields
---
 LogicalTypes.md                | 36 ++++++++++++++++++++++++++++++++++++
 src/main/thrift/parquet.thrift |  8 ++++++++
 2 files changed, 44 insertions(+)

diff --git a/LogicalTypes.md b/LogicalTypes.md
index 1b7d5c2..3aa5ceb 100644
--- a/LogicalTypes.md
+++ b/LogicalTypes.md
@@ -563,6 +563,42 @@ defined by the [BSON specification][bson-spec].
 
 The sort order used for `BSON` is unsigned byte-wise comparison.
 
+### VARIANT
+
+`VARIANT` is used for a Variant value. It must annotate a group. The group must
+contain a field named `metadata` and a field named `value`. Both fields must 
have
+type `binary`, which is also called `BYTE_ARRAY` in the Parquet thrift 
definition.
+The `VARIANT` annotated group can be used to store either an unshredded Variant
+value, or a shredded Variant value.
+
+* The Variant group must be annotated with the `VARIANT` logical type.
+* Both fields `value` and `metadata` must be of type `binary` (called 
`BYTE_ARRAY`
+  in the Parquet thrift definition).
+* The `metadata` field is required and must be a valid Variant metadata 
component,
+  as defined by the [Variant binary encoding 
specification](VariantEncoding.md).
+* When present, the `value` field must be a valid Variant value component,
+  as defined by the [Variant binary encoding 
specification](VariantEncoding.md).
+* The `value` field is required for unshredded Variant values.
+* The `value` field is optional and may be null only when parts of the Variant
+  value are shredded according to the [Variant shredding 
specification](VariantShredding.md).
+
+This is the expected representation of an unshredded Variant in Parquet:
+```
+optional group variant_unshredded (VARIANT) {
+  required binary metadata;
+  required binary value;
+}
+```
+
+This is an example representation of a shredded Variant in Parquet:
+```
+optional group variant_shredded (VARIANT) {
+  required binary metadata;
+  optional binary value;
+  optional int64 typed_value;
+}
+```
+
 ## Nested Types
 
 This section specifies how `LIST` and `MAP` can be used to encode nested types
diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift
index 83457fe..5d4431d 100644
--- a/src/main/thrift/parquet.thrift
+++ b/src/main/thrift/parquet.thrift
@@ -380,6 +380,12 @@ struct JsonType {
 struct BsonType {
 }
 
+/**
+ * Embedded Variant logical type annotation
+ */
+struct VariantType {
+}
+
 /**
  * LogicalType annotations to replace ConvertedType.
  *
@@ -410,6 +416,7 @@ union LogicalType {
   13: BsonType BSON           // use ConvertedType BSON
   14: UUIDType UUID           // no compatible ConvertedType
   15: Float16Type FLOAT16     // no compatible ConvertedType
+  16: VariantType VARIANT     // no compatible ConvertedType
 }
 
 /**
@@ -980,6 +987,7 @@ union ColumnOrder {
    *   ENUM - unsigned byte-wise comparison
    *   LIST - undefined
    *   MAP - undefined
+   *   VARIANT - undefined
    *
    * In the absence of logical types, the sort order is determined by the 
physical type:
    *   BOOLEAN - false, true

Reply via email to