njalan opened a new issue, #5253:
URL: https://github.com/apache/hudi/issues/5253
I am trying to get column lineage from spark sql query plan Below is my sql
for testing and all the tables are hudi table.
insert into test.datahub_3
select a.email, b.phone
from test.datahub_1 a, test.datahub_2 b
where a.phone=b.phone
Below is the code:
def lineageParser(qe: QueryExecution): Unit = {
val analyzedLogicPlan = qe.analyzed
logInfo("----------- start analyzed plan --------")
analyzedLogicPlan.foreach(plan => {
plan match{
case _ => println(plan.getClass)
}
})
logInfo("----------- end analyzed plan --------")
}
Below is the output for query on hive tables:
**class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
class org.apache.spark.sql.catalyst.plans.logical.Project
class org.apache.spark.sql.catalyst.plans.logical.Join
class org.apache.spark.sql.catalyst.plans.logical.Project
class org.apache.spark.sql.catalyst.plans.logical.Filter
class org.apache.spark.sql.execution.datasources.LogicalRelation
class org.apache.spark.sql.catalyst.plans.logical.Project
class org.apache.spark.sql.catalyst.plans.logical.Filter
class org.apache.spark.sql.execution.datasources.LogicalRelation**
Below is the output for query on hudi tables:
**class org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand**
They are totally different:
Environment Description
* Hudi version : 0.9
* Spark version : 3.01
* Hive version : 3.2
* Hadoop version : 3.2.1
* Storage (HDFS/S3/GCS..) : s3
* Running on Docker? (yes/no) : no
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]