rdblue commented on a change in pull request #3764:
URL: https://github.com/apache/iceberg/pull/3764#discussion_r771999598
##########
File path:
spark/v3.2/spark-extensions/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/RowLevelCommandDynamicPruning.scala
##########
@@ -86,6 +89,16 @@ case class RowLevelCommandDynamicPruning(spark:
SparkSession) extends Rule[Logic
val matchingRowsPlan = command match {
case d: DeleteFromIcebergTable =>
Filter(d.condition.get, relation)
+ case u: UpdateIcebergTable =>
+ // UPDATEs with subqueries may be rewritten using a UNION with two
identical scan relations
+ // each scan relation will get its own dynamic filter that will be
shared during execution
+ // the analyzer will assign different expr IDs for each scan relation
output attributes
+ // that's why the condition may refer to invalid attr expr IDs and
must be transformed
Review comment:
If I understand correctly, sometimes a plan gets transformed, from this:
```
ReplaceData
Update(id#1 IN (1, 5), data#2 = 'foo')
V2Relation(db.table, [id#1, data#2])
```
To this:
```
ReplaceData
Update(id#1 IN (1, 5), data#2 = 'foo')
Union([id#1, data#2])
V2Relation(db.table, [id#1, data#2])
V2Relation(db.table, [id#4, data#5])
```
So this is basically creating the dynamic filter for each scan separately
and fixing up the attrs, from the update's IDs to the relation's IDs.
Is there a test case for this, or is it something that you ran into in
practice?
It took me a while to understand this (assuming that I do) and I think it
would be nice if it were more clear what is being replaced. Renaming `attrMap`
to `scanRelationAttrs` would probably help so it is clear we're looking up the
right attr for the scan relation.
It would also be good to to add a comment that explains why `u.table.output`
is always going to match `relation.output`. In fact, I'm not sure that will.
What happens when a column in the data is not used to produce the output? Is it
pruned?
For example, `UPDATE t SET id = 1 WHERE true` doesn't need to project `id`
from the table. If `id` is not projected, then the `attrMap` is incorrect. Is
there a reason why that can't happen?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]