[
https://issues.apache.org/jira/browse/HIVE-29616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HIVE-29616:
----------------------------------
Labels: pull-request-available (was: )
> Incorrect column lineage when multiple subqueries with identical table aliases
> ------------------------------------------------------------------------------
>
> Key: HIVE-29616
> URL: https://issues.apache.org/jira/browse/HIVE-29616
> Project: Hive
> Issue Type: Bug
> Components: lineage
> Affects Versions: 1.1.0
> Reporter: jinqi long
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.3.0
>
>
> The logic in ExprProcFactory#findSourceColumn resolves source columns from
> TopOps by matching table and field aliases. If a match is found, it returns
> the result directly. This implementation fails in scenarios involving
> multiple subqueries with identical table aliases (e.g., in a UNION
> statement). Because the search returns the first match it encounters, it may
> link to the wrong source column from a different subquery branch, leading to
> incorrect lineage. for example:
> {code:java}
> create table table_3 as
> select id1 from table_1 t1 where t1.id2 = 1
> union all
> select id1 from table_2 t1 where t1.id2 = 2;{code}
>
> {code:java}
> The current result is:
> {
> "version": "1.0",
> "engine": "tez",
> "database": "default",
> "hash": "24a0f860f60a1b7d5f350fd8eb164a37",
> "queryText": "create table table_3 as\nselect id1 from table_1 t1 where
> t1.id2 = 1\nunion all\nselect id1 from table_2 t1 where t1.id2 = 2",
> "edges": [
> {
> "sources": [
> 1,
> 2
> ],
> "targets": [
> 0
> ],
> "expression": "id1",
> "edgeType": "PROJECTION"
> },
> {
> "sources": [
> 3
> ],
> "targets": [
> 0
> ],
> "expression": "(t1.id2 = 1)",
> "edgeType": "PREDICATE"
> },
> {
> "sources": [
> 3
> ],
> "targets": [
> 0
> ],
> "expression": "(t1.id2 = 2)",
> "edgeType": "PREDICATE"
> }
> ],
> "vertices": [
> {
> "id": 0,
> "vertexType": "COLUMN",
> "vertexId": "default.table_3.id1"
> },
> {
> "id": 1,
> "vertexType": "COLUMN",
> "vertexId": "default.table_1.id1"
> },
> {
> "id": 2,
> "vertexType": "COLUMN",
> "vertexId": "default.table_2.id1"
> },
> {
> "id": 3,
> "vertexType": "COLUMN",
> "vertexId": "default.table_1.id2"
> }
> ]
> }{code}
> The correct result should be two PREDICATE edges:
> "sources": [default.table_1.id2],"targets": [default.table_3.id1]
> "sources": [default.table_2.id2],"targets": [default.table_3.id1]
> and on PROJECTION edge:
> "sources": [default.table_1.id1,default.table_2.id1],"targets":
> [default.table_3.id1]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)