[GitHub] [iceberg] rdblue commented on pull request #1947: [WIP] Spark MERGE INTO Support (copy-on-write implementation)

GitBox Fri, 18 Dec 2020 12:34:51 -0800


rdblue commented on pull request #1947:
URL: https://github.com/apache/iceberg/pull/1947#issuecomment-748305937

I looked into resolution and there is a rule in Spark:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L1682-L1710

Looks like if the assignments are out of order or a subset of the output
columns, the expressions are left as-is. If there are no assignments, then the
source table's columns are used to set the output columns by position, using an
`Attribute` from the target table as the LHS.

We will need an analyzer rule that fills in the missing assignments for
update, checks the order of assignments by name, and validates that inserts are
complete. I also think that this rule should convert to a different MergeInto
logical plan. The plan in Spark is not sufficient because it considers the plan
resolved when assignments are resolved, not when the assignments actually
produce the expected output. That's strange because resolution produces
assignments when there aren't any, but allows them to be missing when some are
present.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on pull request #1947: [WIP] Spark MERGE INTO Support (copy-on-write implementation)

Reply via email to