[ 
https://issues.apache.org/jira/browse/HIVE-29190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019592#comment-18019592
 ] 

Denys Kuzmenko commented on HIVE-29190:
---------------------------------------

Merged to master
Thanks for the fix, [~ayushtkn]!

> Iceberg: [V3] Fix handling of Delete/Update with DV's
> -----------------------------------------------------
>
>                 Key: HIVE-29190
>                 URL: https://issues.apache.org/jira/browse/HIVE-29190
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Ayush Saxena
>            Priority: Major
>              Labels: pull-request-available
>
> Currently if we try to delete or update on a V3 table. If the DataFile being 
> operated already has a DeleteVector, The subsequent queries fail.
> {noformat}
>  org.apache.hadoop.hive.ql.exec.tez.TezRuntimeException: Vertex failed, 
> vertexName=Map 1, vertexId=vertex_1757524880780_0001_5_00, 
> diagnostics=[Vertex vertex_1757524880780_0001_5_00 [Map 1] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ice01 initializer failed, 
> vertex=vertex_1757524880780_0001_5_00 [Map 1], 
> org.apache.iceberg.exceptions.ValidationException: Can't index multiple DVs 
> for 
> hdfs://localhost:51198/build/ql/test/data/warehouse/ice01/data/00000-0-ayushsaxena_20250910102137_22c067b7-d899-4ecf-9832-c167f7d402a6-job_17575248807800_0001-1-00001.orc:
>  
> DV{location=hdfs://localhost:51198/build/ql/test/data/warehouse/ice01/data/00000-0-ayushsaxena_20250910102141_dc808c4f-746c-4a38-b2dc-ba6b8d719f44-job_17575248807800_0001-2-00001-pos-deletes.orc,
>  offset=4, length=42, 
> referencedDataFile=hdfs://localhost:51198/build/ql/test/data/warehouse/ice01/data/00000-0-ayushsaxena_20250910102137_22c067b7-d899-4ecf-9832-c167f7d402a6-job_17575248807800_0001-1-00001.orc}
>  and 
> DV{location=hdfs://localhost:51198/build/ql/test/data/warehouse/ice01/data/00000-0-ayushsaxena_20250910102142_300e55da-c854-415e-854b-6a0b9ac641da-job_17575248807800_0001-3-00001-pos-deletes.orc,
>  offset=4, length=44, 
> referencedDataFile=hdfs://localhost:51198/build/ql/test/data/warehouse/ice01/data/00000-0-ayushsaxena_20250910102137_22c067b7-d899-4ecf-9832-c167f7d402a6-job_17575248807800_0001-1-00001.orc}
>         at 
> org.apache.iceberg.DeleteFileIndex$Builder.add(DeleteFileIndex.java:509)
>         at 
> org.apache.iceberg.DeleteFileIndex$Builder.build(DeleteFileIndex.java:481)
>         at org.apache.iceberg.ManifestGroup.plan(ManifestGroup.java:185)
>         at org.apache.iceberg.ManifestGroup.planFiles(ManifestGroup.java:172)
>         at org.apache.iceberg.DataTableScan.doPlanFiles(DataTableScan.java:90)
>         at org.apache.iceberg.SnapshotScan.planFiles(SnapshotScan.java:139)
>         at org.apache.iceberg.BaseTableScan.planTasks(BaseTableScan.java:44)
>         at org.apache.iceberg.DataTableScan.planTasks(DataTableScan.java:26)
>         at 
> org.apache.iceberg.mr.mapreduce.IcebergInputFormat.generateInputSplits(IcebergInputFormat.java:230)
>         at 
> org.apache.iceberg.mr.mapreduce.IcebergInputFormat.planInputSplits(IcebergInputFormat.java:199)
>         at 
> org.apache.iceberg.mr.mapreduce.IcebergInputFormat.getSplits(IcebergInputFormat.java:172)
>         at 
> org.apache.iceberg.mr.mapred.MapredIcebergInputFormat.getSplits(MapredIcebergInputFormat.java:69)
>         at 
> org.apache.iceberg.mr.hive.HiveIcebergInputFormat.getSplits(HiveIcebergInputFormat.java:167)
>         at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:585)
>         at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:880)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:363){noformat}
> The reason being: Iceberg V3 only allows one DV per DataFile. 
> Related Iceberg code:
> https://github.com/apache/iceberg/blob/720ef99720a1c59e4670db983c951243dffc4f3e/core/src/main/java/org/apache/iceberg/DeleteFileIndex.java#L507-L509



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to