TEOTEO520 opened a new issue, #7788:
URL: https://github.com/apache/gravitino/issues/7788

   ### Version
   
   main branch
   
   ### Describe what's wrong
   
     ## Problem
   
     When loading tables from external catalogs (e.g., Hive), duplicate columns 
can cause `IllegalStateException: Duplicate key` in
     `TableOperationDispatcher.updateColumnsIfNecessary()`.
   
     ## Root Cause
   
     This issue typically occurs when:
     1. In `HiveTableConverter.getColumns()`, `table.getSd().getCols()` 
contains a partition field (e.g., `log_date`)
     2. The same field also exists in `table.getPartitionKeys()`
     3. This results in duplicate columns in the final `table.columns`
   
     This situation can arise when different engines (Hive, Spark) operate on 
the same Iceberg table, potentially causing HMS metadata inconsistencies.
   
   <img width="1425" height="802" alt="Image" 
src="https://github.com/user-attachments/assets/40418ef8-1d2d-490b-a27b-f7517020e070";
 />
   
   <img width="1239" height="690" alt="Image" 
src="https://github.com/user-attachments/assets/724e2499-14c4-4542-9c1a-e6a32216dae7";
 />
   
   ### Error message and/or stacktrace
   
   "type": "RuntimeException",
     "message": "Failed to operate object [new_version_export_bcut_d] operation 
[LOAD] under [bi_art], reason [Duplicate key (51,BaseColumn(name=log_date, 
comment=日期分区, 
dataType=org.apache.gravitino.rel.types.Types$StringType@69a72836, 
nullable=true, autoIncrement=false, 
defaultValue=org.apache.gravitino.rel.Column$$Lambda$723/554189346@405ccc23))]",
     "stack": [
       "java.lang.IllegalStateException: Duplicate key 
(51,BaseColumn(name=log_date, comment=日期分区, 
dataType=org.apache.gravitino.rel.types.Types$StringType@69a72836, 
nullable=true, autoIncrement=false, 
defaultValue=org.apache.gravitino.rel.Column$$Lambda$723/554189346@405ccc23))",
       "\tat 
java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133)",
       "\tat java.util.HashMap.merge(HashMap.java:1254)",
       "\tat java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320)",
       "\tat 
java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)",
       "\tat java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)",
       "\tat 
java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)",
       "\tat 
java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)",
       "\tat 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)",
       "\tat 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)",
       "\tat 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)",
       "\tat 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)",
       "\tat 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)",
       "\tat 
org.apache.gravitino.catalog.TableOperationDispatcher.updateColumnsIfNecessary(TableOperationDispatcher.java:745)",
       "\tat 
org.apache.gravitino.catalog.TableOperationDispatcher.updateColumnsIfNecessaryWhenLoad(TableOperationDispatcher.java:828)",
       "\tat 
org.apache.gravitino.catalog.TableOperationDispatcher.loadTable(TableOperationDispatcher.java:131)",
       "\tat 
org.apache.gravitino.hook.TableHookDispatcher.loadTable(TableHookDispatcher.java:63)",
       "\tat 
org.apache.gravitino.catalog.TableNormalizeDispatcher.loadTable(TableNormalizeDispatcher.java:64)",
       "\tat 
org.apache.gravitino.listener.TableEventDispatcher.loadTable(TableEventDispatcher.java:124)",
       "\tat 
org.apache.gravitino.bili.onemeta.rest.TableOperations.lambda$loadTableKeeper$8(TableOperations.java:321)",
       "\tat java.security.AccessController.doPrivileged(Native Method)",
       "\tat javax.security.auth.Subject.doAs(Subject.java:422)",
       "\tat 
org.apache.gravitino.utils.PrincipalUtils.doAs(PrincipalUtils.java:39)",
       "\tat org.apache.gravitino.server.web.Utils.doAs(Utils.java:198)",
   
   ### How to reproduce
   
     This issue typically occurs when:
     1. In `HiveTableConverter.getColumns()`, `table.getSd().getCols()` 
contains a partition field (e.g., `log_date`)
     2. The same field also exists in `table.getPartitionKeys()`
     3. This results in duplicate columns in the final `table.columns`
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to