wsk1314zwr commented on issue #4693:
URL: https://github.com/apache/kyuubi/issues/4693#issuecomment-1562992318
I think it is necessary to enhance the table lineage for input tables,I have
the following SQL scenario
```sql
insert overwrite table dev_datalineage.test_where_field
select
a.field1,
a.field2,
a.field3
from dev_datalineage.join_table_a a
JOIN dev_datalineage.join_table_b b on a.field1 = b.field1
JOIN dev_datalineage.join_table_c c on a.field1 = c.field1
where c.field2='2';
```
The lineage information parsed by the **hive lineage hook**:
```json
{
"version": "1.0.0-0526",
"sqlType": "HiveSQL",
"collectTime": "1685021684190",
"operationName": "QUERY",
"vertices": [{
"id": 0,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.test_where_field.field1"
}, {
"id": 1,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.test_where_field.field2"
}, {
"id": 2,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.test_where_field.field3"
}, {
"id": 3,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.join_table_a.field1"
}, {
"id": 4,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.join_table_a.field2"
}, {
"id": 5,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.join_table_a.field3"
}, {
"id": 6,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.join_table_b.field1"
}, {
"id": 7,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.join_table_c.field1"
}, {
"id": 8,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.join_table_c.field2"
}],
"edges": [{
"sources": [3],
"targets": [0],
"edgeType": "PROJECTION"
}, {
"sources": [4],
"targets": [1],
"edgeType": "PROJECTION"
}, {
"sources": [5],
"targets": [2],
"edgeType": "PROJECTION"
}, {
"sources": [3, 6, 7],
"targets": [0, 1, 2],
"expression": "(a.field1 = b.field1 AND a.field1 = c.field1)",
"edgeType": "PREDICATE"
}, {
"sources": [8],
"targets": [0, 1, 2],
"expression": "(c.field2 = '2')",
"edgeType": "PREDICATE"
}],
}
```
The table level lineage of **join_table_b, join_table_b, and
test_where_field** can be obtained from the lineage information of hive
hook,Even if they do not have field level lineage, but the current kyuubi
lineage plugin is not feasible, the table level lineage parsed by hive hook is
more complete,More perfect table level lineage can avoid misjudgment of no
downstream output table in the data governance process.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]