alexeykudinkin commented on code in PR #6264:
URL: https://github.com/apache/hudi/pull/6264#discussion_r942012496
##########
hudi-spark-datasource/hudi-spark3-common/src/main/scala/org/apache/spark/sql/HoodieSpark3CatalystPlanUtils.scala:
##########
@@ -52,8 +57,56 @@ abstract class HoodieSpark3CatalystPlanUtils extends
HoodieCatalystPlansUtils {
}
}
- override def toTableIdentifier(relation: UnresolvedRelation):
TableIdentifier = {
- relation.multipartIdentifier.asTableIdentifier
+ override def resolve(spark: SparkSession, relation: UnresolvedRelation):
Option[CatalogTable] = {
Review Comment:
Moreover this part is not even affected by your changes. Your changes only
affect Time Travel queries, and not Insert Into statement.
##########
hudi-spark-datasource/hudi-spark3-common/src/main/scala/org/apache/spark/sql/HoodieSpark3CatalystPlanUtils.scala:
##########
@@ -52,8 +57,56 @@ abstract class HoodieSpark3CatalystPlanUtils extends
HoodieCatalystPlansUtils {
}
}
- override def toTableIdentifier(relation: UnresolvedRelation):
TableIdentifier = {
- relation.multipartIdentifier.asTableIdentifier
+ override def resolve(spark: SparkSession, relation: UnresolvedRelation):
Option[CatalogTable] = {
Review Comment:
@YannByron thanks for detailed elaboration! Appreciate that!
The point i’m trying to make is that Spark _should be_ resolving the
relations, NOT Hudi — we should not take away functionality from Spark that we
don’t customize, extend or fix. In this case we are not doing either of these
things so we should leave resolving to Spark as it is.
My suggestions are basically following:
1. Avoid copying any resolution logic in your PR, and instead
2. Fix things that fail individually:
- `isHoodieTable` should NOT be invoked on `UnresolvedRelation`. Instead
we should add conditional checking whether target relation in
`InsertIntoStatement` has been fully resolved by Spark.
- Same goes for `TimeTravelRelation` — it’s currently actually
implemented incorrectly. `TimeTravelRelation` is implemented as `Command` ,
while in reality it should just be either `LeafNode` or `UnaryNode` (in which
case it would be resolved by Spark automatically)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]