wsk1314zwr commented on issue #4693:
URL: https://github.com/apache/kyuubi/issues/4693#issuecomment-1562992318

   
   I think it is necessary to enhance the table lineage for input tables,I have 
the following SQL scenario
   ```sql
   insert overwrite table dev_datalineage.test_where_field
   select 
          a.field1,
          a.field2,
          a.field3
   from dev_datalineage.join_table_a a 
   JOIN dev_datalineage.join_table_b b on a.field1 = b.field1 
   JOIN dev_datalineage.join_table_c c on a.field1 = c.field1 
   where c.field2='2';
   ```
   The lineage information parsed by the **hive lineage hook**:
   ```json
   {
        "version": "1.0.0-0526",
        "sqlType": "HiveSQL",
        "collectTime": "1685021684190",
        "operationName": "QUERY",
        "vertices": [{
                "id": 0,
                "vertexType": "COLUMN",
                "vertexId": "dev_datalineage.test_where_field.field1"
        }, {
                "id": 1,
                "vertexType": "COLUMN",
                "vertexId": "dev_datalineage.test_where_field.field2"
        }, {
                "id": 2,
                "vertexType": "COLUMN",
                "vertexId": "dev_datalineage.test_where_field.field3"
        }, {
                "id": 3,
                "vertexType": "COLUMN",
                "vertexId": "dev_datalineage.join_table_a.field1"
        }, {
                "id": 4,
                "vertexType": "COLUMN",
                "vertexId": "dev_datalineage.join_table_a.field2"
        }, {
                "id": 5,
                "vertexType": "COLUMN",
                "vertexId": "dev_datalineage.join_table_a.field3"
        }, {
                "id": 6,
                "vertexType": "COLUMN",
                "vertexId": "dev_datalineage.join_table_b.field1"
        }, {
                "id": 7,
                "vertexType": "COLUMN",
                "vertexId": "dev_datalineage.join_table_c.field1"
        }, {
                "id": 8,
                "vertexType": "COLUMN",
                "vertexId": "dev_datalineage.join_table_c.field2"
        }],
        "edges": [{
                "sources": [3],
                "targets": [0],
                "edgeType": "PROJECTION"
        }, {
                "sources": [4],
                "targets": [1],
                "edgeType": "PROJECTION"
        }, {
                "sources": [5],
                "targets": [2],
                "edgeType": "PROJECTION"
        }, {
                "sources": [3, 6, 7],
                "targets": [0, 1, 2],
                "expression": "(a.field1 = b.field1 AND a.field1 = c.field1)",
                "edgeType": "PREDICATE"
        }, {
                "sources": [8],
                "targets": [0, 1, 2],
                "expression": "(c.field2 = '2')",
                "edgeType": "PREDICATE"
        }],
   }
   ```
   The table level lineage of **join_table_b, join_table_b, and 
test_where_field** can be obtained from the lineage information of hive 
hook,Even if they do not have field level lineage, but the current kyuubi 
lineage plugin is not feasible, the table level lineage parsed by hive hook is 
more complete,More perfect table level lineage can avoid misjudgment of no 
downstream output table in the data governance process.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to