rdblue commented on a change in pull request #3984:
URL: https://github.com/apache/iceberg/pull/3984#discussion_r798745128
##########
File path:
spark/v3.2/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteUpdateTable.scala
##########
@@ -126,6 +128,33 @@ object RewriteUpdateTable extends RewriteRowLevelCommand {
ReplaceData(writeRelation, updatedAndRemainingRowsPlan, relation)
}
+ // build a rewrite plan for sources that support row deltas
+ private def buildWriteDeltaPlan(
+ relation: DataSourceV2Relation,
+ table: RowLevelOperationTable,
+ assignments: Seq[Assignment],
+ cond: Expression): WriteDelta = {
+
+ // resolve all needed attrs (e.g. row ID and any required metadata attrs)
+ val rowAttrs = relation.output
+ val rowIdAttrs = resolveRowIdAttrs(relation, table.operation)
+ val metadataAttrs = resolveRequiredMetadataAttrs(relation, table.operation)
+
+ // construct a scan relation and include all required metadata columns
+ val readRelation = buildReadRelation(relation, table, metadataAttrs,
rowIdAttrs)
+
+ // build a plan for updated records that match the cond
+ val matchedRowsPlan = Filter(cond, readRelation)
+ val updatedRowsPlan = buildUpdateProjection(matchedRowsPlan, assignments)
+ val operationType = Alias(Literal(UPDATE_OPERATION), OPERATION_COLUMN)()
+ val project = Project(operationType +: updatedRowsPlan.output,
updatedRowsPlan)
+
+ // build a plan to write the row delta to the table
+ val writeRelation = relation.copy(table = table)
Review comment:
This caught me the first time I saw it, too. What's happening here is
that we're replacing the original table with the `RowLevelOperationTable` that
wraps the `RowLevelOperation`. The reason is so that scans produced by the
table are coordinated with writes done in the table. We use the table interface
to avoid needing to handle the operation separately.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]