[GitHub] [hudi] YannByron commented on a diff in pull request #6264: [HUDI-4503] support for parsing identifier with catalog

GitBox Wed, 10 Aug 2022 01:54:51 -0700


YannByron commented on code in PR #6264:
URL: https://github.com/apache/hudi/pull/6264#discussion_r942197333



##########
hudi-spark-datasource/hudi-spark3-common/src/main/scala/org/apache/spark/sql/HoodieSpark3CatalystPlanUtils.scala:
##########
@@ -52,8 +57,56 @@ abstract class HoodieSpark3CatalystPlanUtils extends 
HoodieCatalystPlansUtils {
     }
   }
 
-  override def toTableIdentifier(relation: UnresolvedRelation): 
TableIdentifier = {
-    relation.multipartIdentifier.asTableIdentifier
+  override def resolve(spark: SparkSession, relation: UnresolvedRelation): 
Option[CatalogTable] = {

Review Comment:
   Let me explain this in detail.
   1) Hudi injects `HoodieResolveReferences` into the `Resolution` batch in 
Analyzer.
   2) The `Resolution` batch will have these Rule in order: 
`ResolveInsertInto`, `ResolveRelations`, `ResolveTables`, ... , 
`FindDataSourceTable`, ..., `HoodieResolveReferences`.
   3) the sql `insert into target_table select * from source_table` will be 
analyzed by the rules above in order.
   **In the first iterator** over Analyzer's `Resolution` batch: 
   - `ResolveInsertInto` isn't used.
   - `ResolveRelations` is used to resolve the child of `InsertIntoStatement` 
(that is query).Here the target_table doesn't be resolved.
   - `FindDataSourceTable` is not used, b/c the first argument (`table`) of 
`InsertIntoStatement` is still an `UnresolvedRelation`, not an 
`UnresolvedCatalogRelation`.
   - then `HoodieResolveReferences` will be applied. In it, hudi gets the 
`table` object (an `UnresolvedRelation` for now) from `InsertIntoStatement`, 
and want to determine it is a hudi table or not according to this. It will call 
the origin logical that try to transform `UnresolvedRelation` to 
`TableIdentifier` directly. But if it's a three-part format, an exception will 
be thrown like this case.
   
   But, for normal spark which do not have this `HoodieResolveReferences` Rule, 
the `table` object of `InsertIntoStatement` will be resolved by `ResolveTables` 
**in the second iterator** over Analyzer's `Resolution` batch. That's the 
reason why spark can resolve this correctly.
   
   I think the root cause is that hudi injects `HoodieResolveReferences` into 
`Resolution`, not into `Post-Hoc Resolution`. If so, maybe hudi never need to 
resolve `UnresolvedXXXX` by itself. But it's a big changes, need an other issue 
to trace.
   So this pr is the controllable solution to solve the user case that only do 
a few changes in current hudi architecture.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] YannByron commented on a diff in pull request #6264: [HUDI-4503] support for parsing identifier with catalog

Reply via email to