[ https://issues.apache.org/jira/browse/HIVE-29183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HIVE-29183: ---------------------------------- Labels: pull-request-available (was: ) > Integrating Variant Type into Hive > ---------------------------------- > > Key: HIVE-29183 > URL: https://issues.apache.org/jira/browse/HIVE-29183 > Project: Hive > Issue Type: New Feature > Components: Hive, Iceberg integration, SQL > Reporter: Denys Kuzmenko > Priority: Major > Labels: pull-request-available > > A variant is a value that stores semi-structured data. The structure and data > types in a variant are not necessarily consistent across rows in a table or > data file. The variant type and binary encoding are defined in the Parquet > project, with support currently available for V1. Support for Variant is > added in Iceberg v3. > Variants are similar to JSON with a wider set of primitive values including > date, timestamp, timestamptz, binary, and decimals. > Variant values may contain nested types: > * An array is an ordered collection of variant values. > * An object is a collection of fields that are a string key and a variant > value. > As a semi-structured type, there are important differences between variant > and Iceberg's other types: > * Variant arrays are similar to lists, but may contain any variant value > rather than a fixed element type. > * Variant objects are similar to structs, but may contain variable fields > identified by name and field values may be any variant value rather than a > fixed field type. > Variant data types allow for the efficient binary encoding of dynamic > semi-structured data such as JSON, Avro, Parquet, etc. By encoding > semi-structured data as a variant column, we retain the flexibility of the > source data, while allowing query engines to more efficiently operate on the > data. > With the support of Variant type, such data can be encoded in an efficient > binary representation internally for better performance. Without that, we > need to parse the data in its format inefficiently. > This will allow the following use cases: > * Create an Iceberg table with a Variant column > CREATE TABLE IF NOT EXISTS car_sales(record Variant); > * Insert semi-structured data into the Variant column > INSERT INTO car_sales SELECT PARSE_JSON(<json_string>) > * Query against the semi-structured data > SELECT VARIANT_GET(record, '$.dealer.ship', 'string') FROM car_sales > Variant Binary Encoding > https://github.com/apache/parquet-format/blob/master/VariantEncoding.md > Iceberg's Variant type proposal: > https://docs.google.com/document/d/1sq70XDiWJ2DemWyA5dVB80gKzwi0CWoM0LOWM7VJVd8 -- This message was sent by Atlassian Jira (v8.20.10#820010)