This is an automated email from the ASF dual-hosted git repository.
fokko pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git
The following commit(s) were added to refs/heads/master by this push:
new dff0b3e GH-459: Add Variant logical type annotation (#460)
dff0b3e is described below
commit dff0b3e6f02ed28e6b0753e921d53de53e63f506
Author: Gene Pang <[email protected]>
AuthorDate: Wed Nov 6 09:59:24 2024 -0800
GH-459: Add Variant logical type annotation (#460)
* Add Variant as logical type
* Update shredding to use
* Revert changes to the spec
* Update logical types Variant details
* Clarify BYTE_ARRAY and remove underscore fields
---
LogicalTypes.md | 36 ++++++++++++++++++++++++++++++++++++
src/main/thrift/parquet.thrift | 8 ++++++++
2 files changed, 44 insertions(+)
diff --git a/LogicalTypes.md b/LogicalTypes.md
index 1b7d5c2..3aa5ceb 100644
--- a/LogicalTypes.md
+++ b/LogicalTypes.md
@@ -563,6 +563,42 @@ defined by the [BSON specification][bson-spec].
The sort order used for `BSON` is unsigned byte-wise comparison.
+### VARIANT
+
+`VARIANT` is used for a Variant value. It must annotate a group. The group must
+contain a field named `metadata` and a field named `value`. Both fields must
have
+type `binary`, which is also called `BYTE_ARRAY` in the Parquet thrift
definition.
+The `VARIANT` annotated group can be used to store either an unshredded Variant
+value, or a shredded Variant value.
+
+* The Variant group must be annotated with the `VARIANT` logical type.
+* Both fields `value` and `metadata` must be of type `binary` (called
`BYTE_ARRAY`
+ in the Parquet thrift definition).
+* The `metadata` field is required and must be a valid Variant metadata
component,
+ as defined by the [Variant binary encoding
specification](VariantEncoding.md).
+* When present, the `value` field must be a valid Variant value component,
+ as defined by the [Variant binary encoding
specification](VariantEncoding.md).
+* The `value` field is required for unshredded Variant values.
+* The `value` field is optional and may be null only when parts of the Variant
+ value are shredded according to the [Variant shredding
specification](VariantShredding.md).
+
+This is the expected representation of an unshredded Variant in Parquet:
+```
+optional group variant_unshredded (VARIANT) {
+ required binary metadata;
+ required binary value;
+}
+```
+
+This is an example representation of a shredded Variant in Parquet:
+```
+optional group variant_shredded (VARIANT) {
+ required binary metadata;
+ optional binary value;
+ optional int64 typed_value;
+}
+```
+
## Nested Types
This section specifies how `LIST` and `MAP` can be used to encode nested types
diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift
index 83457fe..5d4431d 100644
--- a/src/main/thrift/parquet.thrift
+++ b/src/main/thrift/parquet.thrift
@@ -380,6 +380,12 @@ struct JsonType {
struct BsonType {
}
+/**
+ * Embedded Variant logical type annotation
+ */
+struct VariantType {
+}
+
/**
* LogicalType annotations to replace ConvertedType.
*
@@ -410,6 +416,7 @@ union LogicalType {
13: BsonType BSON // use ConvertedType BSON
14: UUIDType UUID // no compatible ConvertedType
15: Float16Type FLOAT16 // no compatible ConvertedType
+ 16: VariantType VARIANT // no compatible ConvertedType
}
/**
@@ -980,6 +987,7 @@ union ColumnOrder {
* ENUM - unsigned byte-wise comparison
* LIST - undefined
* MAP - undefined
+ * VARIANT - undefined
*
* In the absence of logical types, the sort order is determined by the
physical type:
* BOOLEAN - false, true