mchades commented on code in PR #7789:
URL: https://github.com/apache/gravitino/pull/7789#discussion_r2256684143


##########
core/src/main/java/org/apache/gravitino/catalog/TableOperationDispatcher.java:
##########
@@ -624,12 +624,32 @@ private Pair<Boolean, List<ColumnEntity>> 
updateColumnsIfNecessary(
             ? Collections.emptyMap()
             : IntStream.range(0, tableFromCatalog.columns().length)
                 .mapToObj(i -> Pair.of(i, tableFromCatalog.columns()[i]))
-                .collect(Collectors.toMap(p -> p.getRight().name(), 
Function.identity()));
+                .collect(
+                    Collectors.toMap(
+                        p -> p.getRight().name(),
+                        Function.identity(),
+                        (existing, replacement) -> {
+                          LOG.warn(
+                              "Duplicate column name '{}' found at position {} 
and {}, using the first occurrence",
+                              existing.getRight().name(),
+                              existing.getLeft(),
+                              replacement.getLeft());
+                          return existing;

Review Comment:
   >  This commonly occurs when:
   > 
   > HiveTableConverter.getColumns() includes a partition field (e.g., 
log_date) from table.getSd().getCols()
   > The same field also exists in table.getPartitionKeys()
   > This results in duplicate columns in the final table schema
   
   As in the case you mentioned, we should fix the logic of the Hive catalog 
instead of allowing duplicate column names.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to