ulysses-you commented on PR #4694:
URL: https://github.com/apache/kyuubi/pull/4694#issuecomment-1504905533
I think we'd better to define lineage clearly. The current lineage is easy
to follow that we only considers plan's output, which means if a column is used
to do filter, sort or something do not affect schema, then we ignore it.
So if we want to consider those columns, how about adding a new mode to
fully extract column and table lineage? For example:
```sql
INSERT INTO TABLE t
SELECT c1 FROM t1 WHERE c2 > 0 ORDER BY c3
-- The lineage should be:
ColumnUsage(to: String, from: String, usage: String)
Lineage(
List("default.t1"),
List("default.t"),
List(
ColumnUsage("c1", "default.t.c1", "OUTPUT"),
ColumnUsage("N/A", "default.t.c2", "PREDICATE"),
ColumnUsage("N/A", "default.t.c3", "ORDERING")
)
)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]