[I] Implement VariantVisitor (parquet) to support MERGE INTO operations [iceberg]

via GitHub Fri, 28 Nov 2025 05:52:56 -0800


enriquh opened a new issue, #14707:
URL: https://github.com/apache/iceberg/issues/14707


   ### Feature Request / Improvement
   
   **Feature description**
   
   Implement schema visitor for variant datatype in parquet. This is currently 
not implemented on the 
[code](https://github.com/apache/iceberg/blob/cf27769762d678a8a790481318b6346c2dd480ff/parquet/src/main/java/org/apache/iceberg/parquet/TypeWithSchemaVisitor.java#L242)
   
   **Steps to reproduce**
   
   1. Create a table with a variant field
   2. Perform a MERGE INTO operation to update records on the target table 
using a variant property on the condition
   **Result**
   3. Issue also happens when sub-variant extraction are not included in the 
condition.
   
   `UnsupportedOperationException: Not implemented for variant
   at 
org.apache.iceberg.parquet.TypeWithSchemaVisitor.variant(TypeWithSchemaVisitor.java:242)`
   
   > spark.sql(f"""
               CREATE TABLE {table_name} (
                   id BIGINT,
                   variant_data VARIANT
               ) USING iceberg
               TBLPROPERTIES ('format-version' = '3')
           """)
   
   > merge_sql_2 = f"""
   >             MERGE INTO {table_name} AS target
   >             USING merge_source AS source
   >             ON variant_get(target.variant_data, '$.name', 'string') = 
variant_get(source.variant_data, '$.name', 'string')
   >                AND target.id = source.id
   >             WHEN MATCHED THEN
   >                 UPDATE SET target.variant_data = source.variant_data
   >             WHEN NOT MATCHED THEN
   >                 INSERT (id, variant_data) VALUES (source.id, 
source.variant_data)
   >         """
   
   **Expected results**
   
   Ability to execute MERGE operation on tables with variant fields being able 
to use 
[variant_get](https://spark.apache.org/docs/4.0.0/api/python/reference/pyspark.sql/api/pyspark.sql.functions.variant_get.html)
 in Spark to reduce scanned data.
   
   **Environment details**
   
   * Spark 4.0
   * Iceberg 1.11 (build from main)
   
   ### Query engine
   
   Spark
   
   ### Willingness to contribute
   
   - [ ] I can contribute this improvement/feature independently
   - [ ] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [x] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Implement VariantVisitor (parquet) to support MERGE INTO operations [iceberg]

Reply via email to