[GitHub] [iceberg] rdblue commented on a change in pull request #3764: Spark: Implement copy-on-write UPDATE

GitBox Sun, 19 Dec 2021 13:22:14 -0800


rdblue commented on a change in pull request #3764:
URL: https://github.com/apache/iceberg/pull/3764#discussion_r772002405




##########
File path: 
spark/v3.2/spark-extensions/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/RowLevelCommandDynamicPruning.scala
##########
@@ -86,6 +89,16 @@ case class RowLevelCommandDynamicPruning(spark: 
SparkSession) extends Rule[Logic
     val matchingRowsPlan = command match {
       case d: DeleteFromIcebergTable =>
         Filter(d.condition.get, relation)
+      case u: UpdateIcebergTable =>
+        // UPDATEs with subqueries may be rewritten using a UNION with two 
identical scan relations
+        // each scan relation will get its own dynamic filter that will be 
shared during execution
+        // the analyzer will assign different expr IDs for each scan relation 
output attributes
+        // that's why the condition may refer to invalid attr expr IDs and 
must be transformed

Review comment:
       Okay, after reading through all the cases for rewriting update, this 
makes a lot more sense. I think it would have helped me reading through this to 
either direct to the union case in `RewriteUpdateTable` or to explain the 
rewrite here and give a little more context on when there will be a union.
   
   As for my other comments, I think it would still be good to rename `attrMap` 
and I'm not sure whether there could be a case where `u.table.output` doesn't 
match `relation.output`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #3764: Spark: Implement copy-on-write UPDATE

Reply via email to