[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6264: [HUDI-4503] support for parsing identifier with catalog

GitBox Wed, 10 Aug 2022 12:39:40 -0700


alexeykudinkin commented on code in PR #6264:
URL: https://github.com/apache/hudi/pull/6264#discussion_r942012496



##########
hudi-spark-datasource/hudi-spark3-common/src/main/scala/org/apache/spark/sql/HoodieSpark3CatalystPlanUtils.scala:
##########
@@ -52,8 +57,56 @@ abstract class HoodieSpark3CatalystPlanUtils extends 
HoodieCatalystPlansUtils {
     }
   }
 
-  override def toTableIdentifier(relation: UnresolvedRelation): 
TableIdentifier = {
-    relation.multipartIdentifier.asTableIdentifier
+  override def resolve(spark: SparkSession, relation: UnresolvedRelation): 
Option[CatalogTable] = {

Review Comment:
   Moreover this part is not even affected by your changes. Your changes only 
affect Time Travel queries, and not Insert Into statement.



##########
hudi-spark-datasource/hudi-spark3-common/src/main/scala/org/apache/spark/sql/HoodieSpark3CatalystPlanUtils.scala:
##########
@@ -52,8 +57,56 @@ abstract class HoodieSpark3CatalystPlanUtils extends 
HoodieCatalystPlansUtils {
     }
   }
 
-  override def toTableIdentifier(relation: UnresolvedRelation): 
TableIdentifier = {
-    relation.multipartIdentifier.asTableIdentifier
+  override def resolve(spark: SparkSession, relation: UnresolvedRelation): 
Option[CatalogTable] = {

Review Comment:
   @YannByron thanks for detailed elaboration! Appreciate that!
   
   The point i’m trying to make is that Spark _should be_ resolving the 
relations, NOT Hudi — we should not take away functionality from Spark that we 
don’t customize, extend or fix. In this case we are not doing either of these 
things so we should leave resolving to Spark as it is.
   
   My suggestions are basically following:
   
   1. Avoid copying any resolution logic in your PR, and instead
   2. Fix things that fail individually:
   
       - `isHoodieTable` should NOT be invoked on `UnresolvedRelation`. Instead 
we should add conditional checking whether target relation in 
`InsertIntoStatement`  has been fully resolved by Spark.
       - Same goes for `TimeTravelRelation` — it’s currently actually 
implemented incorrectly. `TimeTravelRelation` is implemented as `Command` , 
while in reality it should just be either `LeafNode` or `UnaryNode` (in which 
case it would be resolved by Spark automatically)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6264: [HUDI-4503] support for parsing identifier with catalog

Reply via email to