[GitHub] [flink] wenlong88 commented on a change in pull request #18017: [FLINK-25171] Validation of duplicate fields in ddl sql

GitBox Wed, 15 Dec 2021 01:52:05 -0800


wenlong88 commented on a change in pull request #18017:
URL: https://github.com/apache/flink/pull/18017#discussion_r769442384




##########
File path: 
flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/operations/MergeTableLikeUtilTest.java
##########
@@ -126,6 +126,72 @@ public void mergeWithIncludeFailsOnDuplicateColumn() {
                 null);
     }
 
+    @Test
+    public void mergeWithIncludeFailsOnDuplicateRegularColumn() {
+        TableSchema sourceSchema =
+                TableSchema.builder().add(TableColumn.physical("one", 
DataTypes.INT())).build();
+
+        List<SqlNode> derivedColumns =
+                Arrays.asList(
+                        regularColumn("two", DataTypes.INT()),
+                        regularColumn("two", DataTypes.INT()),
+                        regularColumn("four", DataTypes.STRING()));
+
+        thrown.expect(ValidationException.class);
+        thrown.expectMessage("A regular Column named 'two' already exists in 
the table.");
+        util.mergeTables(
+                getDefaultMergingStrategies(),
+                sourceSchema,
+                derivedColumns,
+                Collections.emptyList(),
+                null);
+    }
+
+    @Test
+    public void 
mergeWithIncludeFailsOnDuplicateRegularColumnAndComputeColumn() {
+        TableSchema sourceSchema =
+                TableSchema.builder().add(TableColumn.physical("one", 
DataTypes.INT())).build();
+
+        List<SqlNode> derivedColumns =
+                Arrays.asList(
+                        regularColumn("two", DataTypes.INT()),
+                        computedColumn("three", plus("two", "3")),
+                        regularColumn("three", DataTypes.INT()),
+                        regularColumn("four", DataTypes.STRING()));
+
+        thrown.expect(ValidationException.class);
+        thrown.expectMessage("A column named 'three' already exists in the 
table.");
+        util.mergeTables(
+                getDefaultMergingStrategies(),
+                sourceSchema,
+                derivedColumns,
+                Collections.emptyList(),
+                null);
+    }
+
+    @Test
+    public void 
mergeWithIncludeFailsOnDuplicateRegularColumnAndMetadataColumn() {
+        TableSchema sourceSchema =
+                TableSchema.builder().add(TableColumn.physical("one", 
DataTypes.INT())).build();
+
+        List<SqlNode> derivedColumns =
+                Arrays.asList(
+                        metadataColumn("two", DataTypes.INT(), true),
+                        computedColumn("three", plus("two", "3")),
+                        regularColumn("two", DataTypes.INT()),
+                        regularColumn("four", DataTypes.STRING()));
+
+        thrown.expect(ValidationException.class);
+        thrown.expectMessage(

Review comment:
       in this case, I think it would be better to throw with information that 
there are duplicate column with name 'two' in metadata column and regular 
column?

##########
File path: 
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/operations/MergeTableLikeUtil.java
##########
@@ -426,6 +433,14 @@ private void appendDerivedColumns(
                         }
                     }
 
+                    Set<String> physicalFieldNames = 
physicalFieldNamesToTypes.keySet();
+                    Set<String> metadataFieldNames = 
metadataFieldNamesToTypes.keySet();
+                    final Set<String> result = new 
LinkedHashSet<>(physicalFieldNames);
+                    result.retainAll(metadataFieldNames);
+                    if (!result.isEmpty()) {
+                        throw new ValidationException(
+                                "A field name conflict exists between a field 
of the regular type and a field of the Metadata type.");
+                    }

Review comment:
       hi, @jelly-1203 thanks for the update and  analysis, I agree that it not 
 possible to add unified check at the end. but I still think that the check 
here can be improved a bit:
   1. it seems that the check here is not relevant to the new added 
ComputingColumn, if you want to check the duplication, it is better to add it 
when updating metadataFieldNamesToTypes.
   2. according to current implementation, regular columns have top priority(we 
collect collectPhysicalFieldsTypes at the beginning), so we may also need to 
check if there is duplicated name in physicalFieldNamesToTypes when trying to a 
metadata column or computed column. If we add such check, the check in 1 is not 
necessary any more.
   
   what do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] wenlong88 commented on a change in pull request #18017: [FLINK-25171] Validation of duplicate fields in ddl sql

Reply via email to