iodone commented on PR #4694:
URL: https://github.com/apache/kyuubi/pull/4694#issuecomment-1506679998

   > I think we'd better to define lineage clearly. The current lineage is easy 
to follow that we only consider plan's output, which means if a column is used 
to do filter, sort or something do not affect schema, then we ignore it.
   > 
   > So if we want to consider those columns, how about adding a new mode to 
fully extract column and table lineage? For example:
   > 
   > ```sql
   > INSERT INTO TABLE t
   > SELECT c1 FROM t1 WHERE c2 > 0 ORDER BY c3
   > 
   > -- The lineage should be:
   > 
   > ColumnUsage(to: String, from: String, usage: String)
   > 
   > Lineage(
   >   List("default.t1"),
   >   List("default.t"),
   >   List(
   >      ColumnUsage("c1", "default.t1.c1", "OUTPUT"),
   >      ColumnUsage("N/A", "default.t1.c2", "PREDICATE"),
   >      ColumnUsage("N/A", "default.t1.c3", "ORDERING")
   >   )
   > )
   > ```
   
   Yes, from the perspective of column output, the current lineage relationship 
is clear. The main purpose of this PR is to analyze lineage relationships from 
the perspective of table lineage and consider any table involved in SQL as an 
input table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to