TEOTEO520 opened a new issue, #7788:
URL: https://github.com/apache/gravitino/issues/7788
### Version
main branch
### Describe what's wrong
## Problem
When loading tables from external catalogs (e.g., Hive), duplicate columns
can cause `IllegalStateException: Duplicate key` in
`TableOperationDispatcher.updateColumnsIfNecessary()`.
## Root Cause
This issue typically occurs when:
1. In `HiveTableConverter.getColumns()`, `table.getSd().getCols()`
contains a partition field (e.g., `log_date`)
2. The same field also exists in `table.getPartitionKeys()`
3. This results in duplicate columns in the final `table.columns`
This situation can arise when different engines (Hive, Spark) operate on
the same Iceberg table, potentially causing HMS metadata inconsistencies.
<img width="1425" height="802" alt="Image"
src="https://github.com/user-attachments/assets/40418ef8-1d2d-490b-a27b-f7517020e070"
/>
<img width="1239" height="690" alt="Image"
src="https://github.com/user-attachments/assets/724e2499-14c4-4542-9c1a-e6a32216dae7"
/>
### Error message and/or stacktrace
"type": "RuntimeException",
"message": "Failed to operate object [new_version_export_bcut_d] operation
[LOAD] under [bi_art], reason [Duplicate key (51,BaseColumn(name=log_date,
comment=日期分区,
dataType=org.apache.gravitino.rel.types.Types$StringType@69a72836,
nullable=true, autoIncrement=false,
defaultValue=org.apache.gravitino.rel.Column$$Lambda$723/554189346@405ccc23))]",
"stack": [
"java.lang.IllegalStateException: Duplicate key
(51,BaseColumn(name=log_date, comment=日期分区,
dataType=org.apache.gravitino.rel.types.Types$StringType@69a72836,
nullable=true, autoIncrement=false,
defaultValue=org.apache.gravitino.rel.Column$$Lambda$723/554189346@405ccc23))",
"\tat
java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133)",
"\tat java.util.HashMap.merge(HashMap.java:1254)",
"\tat java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320)",
"\tat
java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)",
"\tat java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)",
"\tat
java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)",
"\tat
java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)",
"\tat
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)",
"\tat
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)",
"\tat
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)",
"\tat
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)",
"\tat
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)",
"\tat
org.apache.gravitino.catalog.TableOperationDispatcher.updateColumnsIfNecessary(TableOperationDispatcher.java:745)",
"\tat
org.apache.gravitino.catalog.TableOperationDispatcher.updateColumnsIfNecessaryWhenLoad(TableOperationDispatcher.java:828)",
"\tat
org.apache.gravitino.catalog.TableOperationDispatcher.loadTable(TableOperationDispatcher.java:131)",
"\tat
org.apache.gravitino.hook.TableHookDispatcher.loadTable(TableHookDispatcher.java:63)",
"\tat
org.apache.gravitino.catalog.TableNormalizeDispatcher.loadTable(TableNormalizeDispatcher.java:64)",
"\tat
org.apache.gravitino.listener.TableEventDispatcher.loadTable(TableEventDispatcher.java:124)",
"\tat
org.apache.gravitino.bili.onemeta.rest.TableOperations.lambda$loadTableKeeper$8(TableOperations.java:321)",
"\tat java.security.AccessController.doPrivileged(Native Method)",
"\tat javax.security.auth.Subject.doAs(Subject.java:422)",
"\tat
org.apache.gravitino.utils.PrincipalUtils.doAs(PrincipalUtils.java:39)",
"\tat org.apache.gravitino.server.web.Utils.doAs(Utils.java:198)",
### How to reproduce
This issue typically occurs when:
1. In `HiveTableConverter.getColumns()`, `table.getSd().getCols()`
contains a partition field (e.g., `log_date`)
2. The same field also exists in `table.getPartitionKeys()`
3. This results in duplicate columns in the final `table.columns`
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]