[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #4848: Spark: Fix Alignment of Merge Commands with Mixed Case

GitBox Mon, 23 May 2022 17:45:46 -0700


aokolnychyi commented on code in PR #4848:
URL: https://github.com/apache/iceberg/pull/4848#discussion_r879962532



##########
spark/v3.2/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlignRowLevelCommandAssignments.scala:
##########
@@ -84,7 +84,16 @@ object AlignRowLevelCommandAssignments
           throw new AnalysisException("Not matched actions can only contain 
INSERT")
       }
 
-      m.copy(matchedActions = alignedMatchedActions, notMatchedActions = 
alignedNotMatchedActions)
+      val alignedMerge = m.copy(
+        matchedActions = alignedMatchedActions,
+        notMatchedActions = alignedNotMatchedActions)
+
+      if (!alignedMerge.aligned) {

Review Comment:
   An alternative way of solving this is to provide a check like 
`MergeIntoIcebergTableResolutionCheck`. It will be run after all resolution 
rules and we will know for sure that we failed to align the assignments. If we 
fail here, we won't give Spark any chances to fix the alignments using other 
rules.



##########
spark/v3.2/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMerge.java:
##########
@@ -1186,6 +1186,32 @@ public void testMergeAlignsUpdateAndInsertActions() {
         sql("SELECT * FROM %s ORDER BY id", tableName));
   }
 
+  @Test
+  public void testMergeMixedCaseAlignsUpdateAndInsertActions() {
+    createAndInitTable("id INT, a INT, b STRING", "{ \"id\": 1, \"a\": 2, 
\"b\": \"str\" }");
+    createOrReplaceView(
+        "source",
+        "{ \"id\": 1, \"c1\": -2, \"c2\": \"new_str_1\" }\n" +
+            "{ \"id\": 2, \"c1\": -20, \"c2\": \"new_str_2\" }");

Review Comment:
   nit: the rest of this file aligns the json records slightly differently (no 
extra indentation for 2nd record)



##########
spark/v3.2/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlignRowLevelCommandAssignments.scala:
##########
@@ -84,7 +84,16 @@ object AlignRowLevelCommandAssignments
           throw new AnalysisException("Not matched actions can only contain 
INSERT")
       }
 
-      m.copy(matchedActions = alignedMatchedActions, notMatchedActions = 
alignedNotMatchedActions)
+      val alignedMerge = m.copy(
+        matchedActions = alignedMatchedActions,
+        notMatchedActions = alignedNotMatchedActions)
+
+      if (!alignedMerge.aligned) {

Review Comment:
   I wonder whether we should also cover UPDATEs too?



##########
spark/v3.2/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/expressions/AssignmentUtils.scala:
##########
@@ -40,7 +40,15 @@ object AssignmentUtils extends SQLConfHelper {
     sameSize && table.output.zip(assignments).forall { case (attr, assignment) 
=>
       val key = assignment.key
       val value = assignment.value
-      toAssignmentRef(attr) == toAssignmentRef(key) &&
+      val refsEqual = if (conf.caseSensitiveAnalysis) {

Review Comment:
   Seems like we can use `conf.resolver` that would abstract this away? Then we 
will have only one case to handle.



##########
spark/v3.2/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMerge.java:
##########
@@ -1186,6 +1186,32 @@ public void testMergeAlignsUpdateAndInsertActions() {
         sql("SELECT * FROM %s ORDER BY id", tableName));
   }
 
+  @Test
+  public void testMergeMixedCaseAlignsUpdateAndInsertActions() {
+    createAndInitTable("id INT, a INT, b STRING", "{ \"id\": 1, \"a\": 2, 
\"b\": \"str\" }");
+    createOrReplaceView(
+        "source",
+        "{ \"id\": 1, \"c1\": -2, \"c2\": \"new_str_1\" }\n" +
+            "{ \"id\": 2, \"c1\": -20, \"c2\": \"new_str_2\" }");
+
+    sql("MERGE INTO %s t USING source " +
+            "ON t.iD == source.Id " +

Review Comment:
   nit: same for MERGE statements



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #4848: Spark: Fix Alignment of Merge Commands with Mixed Case

Reply via email to