alamb commented on issue #75:
URL: https://github.com/apache/parquet-testing/issues/75#issuecomment-2800088160
I played around with variant in spark 4.0 preview a bit today and figured
out how to generate variant columns:
# Here is an example of how to make variant columns
```sql
-- Run in spark 4.0 preview
--
-- Remove local catalog first
-- rm -rf spark-warehouse/
DROP TABLE IF EXISTS T;
CREATE TABLE T (id INT, variant_col VARIANT);
INSERT INTO T VALUES (1, parse_json('{"foo": "bar", "baz": 42}'));
INSERT INTO T VALUES (2, parse_json('{"baz": 32}'));
```
# Generated Parquet File:
The variant_col is stored in parquet as a Struct with two fields:
* `value`: Binary
* `metadata`: Binary
```sql
> describe
'part-00000-c13d3cac-027c-4ffc-acdd-c5ba41e2f6b7-c000.snappy.parquet';
+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| column_name | data_type
| is_nullable |
+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| id | Int32
| NO |
| variant_col | Struct([Field { name: "value", data_type: Binary, nullable:
false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name:
"metadata", data_type: Binary, nullable: false, dict_id: 0, dict_is_ordered:
false, metadata: {} }]) | NO |
+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
2 row(s) fetched.
Elapsed 0.003 seconds.
```
BTW here is how to access variant fields (using `try_variant_get`):
```sl
--
https://docs.databricks.com/aws/en/sql/language-manual/functions/try_variant_get
SELECT try_variant_get(variant_col, '$.foo') from T;
SELECT try_variant_get(variant_col, '$.foo', 'string') from T;
SELECT try_variant_get(variant_col, '$.foo', 'timestamp') from T;
SELECT try_variant_get(variant_col, '$.baz', 'timestamp') from T;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]