[ 
https://issues.apache.org/jira/browse/ATLAS-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Li updated ATLAS-3290:
-------------------------
    Description: 
The column name in Impala lineage record may not contain its database name and 
its table name. 

To get its  its database name and its table name, we should use the metadata in 
a vertex, not assuming column name contains its database name and its table 
name. 

When assuming that column name always contains its database name and its table 
name, we run into the following exception

{code}
I0618 19:16:02.415920 209817 QueryEventHookManager.java:212] Initiating 
onQueryComplete: org.apache.atlas.impala.hook.ImpalaLineageHook
E0618 19:16:02.418964 210738 ImpalaLineageHook.java:126] 
ImpalaLineageHook.process(): failed to process query create table sales_sg as 
select * from sales_asia
Java exception follows:
java.lang.IllegalArgumentException: fullColumnName {} does not contain database 
name or table name
        at 
org.apache.atlas.impala.hook.AtlasImpalaHookContext.getQualifiedNameForColumn(AtlasImpalaHookContext.java:115)
        at 
org.apache.atlas.impala.hook.events.BaseImpalaEvent.getQualifiedName(BaseImpalaEvent.java:164)
        at 
org.apache.atlas.impala.hook.events.BaseImpalaEvent.getQualifiedName(BaseImpalaEvent.java:134)
        at 
org.apache.atlas.impala.hook.events.BaseImpalaEvent.getColumnEntities(BaseImpalaEvent.java:495)
        at 
org.apache.atlas.impala.hook.events.BaseImpalaEvent.toTableEntity(BaseImpalaEvent.java:430)
        at 
org.apache.atlas.impala.hook.events.BaseImpalaEvent.toTableEntity(BaseImpalaEvent.java:393)
        at 
org.apache.atlas.impala.hook.events.BaseImpalaEvent.toAtlasEntity(BaseImpalaEvent.java:315)
        at 
org.apache.atlas.impala.hook.events.BaseImpalaEvent.getInputOutputEntity(BaseImpalaEvent.java:297)
        at 
org.apache.atlas.impala.hook.events.CreateImpalaProcess.getEntities(CreateImpalaProcess.java:103)
        at 
org.apache.atlas.impala.hook.events.CreateImpalaProcess.getNotificationMessages(CreateImpalaProcess.java:54)
        at 
org.apache.atlas.impala.hook.ImpalaLineageHook.process(ImpalaLineageHook.java:122)
        at 
org.apache.atlas.impala.hook.ImpalaLineageHook.process(ImpalaLineageHook.java:79)
        at 
org.apache.atlas.impala.hook.ImpalaHook.onQueryComplete(ImpalaHook.java:36)
        at 
org.apache.atlas.impala.hook.ImpalaLineageHook.onQueryComplete(ImpalaLineageHook.java:52)
        at 
org.apache.impala.hooks.QueryEventHookManager.lambda$null$1(QueryEventHookManager.java:215)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
{code}

The lineage record from Impala is
{code}
{  
   "queryText":"create table sales_china as select * from sales_asia",
   "queryId":"2940d0b242de53ea:e82ba8d300000000",
   "hash":"a705a9ec851a5440afca0dfb8df86cd5",
   "user":"root",
   "timestamp":1560885032,
   "endTime":1560885040,
   "edges":[  
      {  
         "sources":[  
            1
         ],
         "targets":[  
            0
         ],
         "edgeType":"PROJECTION"
      },
      {  
         "sources":[  
            3
         ],
         "targets":[  
            2
         ],
         "edgeType":"PROJECTION"
      }
   ],
   "vertices":[  
      {  
         "id":0,
         "vertexType":"COLUMN",
         "vertexId":"id",
         "metadata":{  
            "tableName":"sales_db.sales_china",
            "tableCreateTime":1560885039
         }
      },
      {  
         "id":1,
         "vertexType":"COLUMN",
         "vertexId":"sales_db.sales_asia.id",
         "metadata":{  
            "tableName":"sales_db.sales_asia",
            "tableCreateTime":1560884919
         }
      },
      {  
         "id":2,
         "vertexType":"COLUMN",
         "vertexId":"name",
         "metadata":{  
            "tableName":"sales_db.sales_china",
            "tableCreateTime":1560885039
         }
      },
      {  
         "id":3,
         "vertexType":"COLUMN",
         "vertexId":"sales_db.sales_asia.name",
         "metadata":{  
            "tableName":"sales_db.sales_asia",
            "tableCreateTime":1560884919
         }
      }
   ]
}

{code}
 

  was:
The column name in Impala lineage record may not contain its database name and 
its table name. 

To get its  its database name and its table name, we should use the metadata in 
a vertex, not assuming column name contains its database name and its table 
name. 


> Impala Hook: Get database name and table name from vertex metadata
> ------------------------------------------------------------------
>
>                 Key: ATLAS-3290
>                 URL: https://issues.apache.org/jira/browse/ATLAS-3290
>             Project: Atlas
>          Issue Type: New Feature
>          Components:  atlas-core
>    Affects Versions: 2.1.0
>            Reporter: Na Li
>            Assignee: Na Li
>            Priority: Major
>
> The column name in Impala lineage record may not contain its database name 
> and its table name. 
> To get its  its database name and its table name, we should use the metadata 
> in a vertex, not assuming column name contains its database name and its 
> table name. 
> When assuming that column name always contains its database name and its 
> table name, we run into the following exception
> {code}
> I0618 19:16:02.415920 209817 QueryEventHookManager.java:212] Initiating 
> onQueryComplete: org.apache.atlas.impala.hook.ImpalaLineageHook
> E0618 19:16:02.418964 210738 ImpalaLineageHook.java:126] 
> ImpalaLineageHook.process(): failed to process query create table sales_sg as 
> select * from sales_asia
> Java exception follows:
> java.lang.IllegalArgumentException: fullColumnName {} does not contain 
> database name or table name
>         at 
> org.apache.atlas.impala.hook.AtlasImpalaHookContext.getQualifiedNameForColumn(AtlasImpalaHookContext.java:115)
>         at 
> org.apache.atlas.impala.hook.events.BaseImpalaEvent.getQualifiedName(BaseImpalaEvent.java:164)
>         at 
> org.apache.atlas.impala.hook.events.BaseImpalaEvent.getQualifiedName(BaseImpalaEvent.java:134)
>         at 
> org.apache.atlas.impala.hook.events.BaseImpalaEvent.getColumnEntities(BaseImpalaEvent.java:495)
>         at 
> org.apache.atlas.impala.hook.events.BaseImpalaEvent.toTableEntity(BaseImpalaEvent.java:430)
>         at 
> org.apache.atlas.impala.hook.events.BaseImpalaEvent.toTableEntity(BaseImpalaEvent.java:393)
>         at 
> org.apache.atlas.impala.hook.events.BaseImpalaEvent.toAtlasEntity(BaseImpalaEvent.java:315)
>         at 
> org.apache.atlas.impala.hook.events.BaseImpalaEvent.getInputOutputEntity(BaseImpalaEvent.java:297)
>         at 
> org.apache.atlas.impala.hook.events.CreateImpalaProcess.getEntities(CreateImpalaProcess.java:103)
>         at 
> org.apache.atlas.impala.hook.events.CreateImpalaProcess.getNotificationMessages(CreateImpalaProcess.java:54)
>         at 
> org.apache.atlas.impala.hook.ImpalaLineageHook.process(ImpalaLineageHook.java:122)
>         at 
> org.apache.atlas.impala.hook.ImpalaLineageHook.process(ImpalaLineageHook.java:79)
>         at 
> org.apache.atlas.impala.hook.ImpalaHook.onQueryComplete(ImpalaHook.java:36)
>         at 
> org.apache.atlas.impala.hook.ImpalaLineageHook.onQueryComplete(ImpalaLineageHook.java:52)
>         at 
> org.apache.impala.hooks.QueryEventHookManager.lambda$null$1(QueryEventHookManager.java:215)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> The lineage record from Impala is
> {code}
> {  
>    "queryText":"create table sales_china as select * from sales_asia",
>    "queryId":"2940d0b242de53ea:e82ba8d300000000",
>    "hash":"a705a9ec851a5440afca0dfb8df86cd5",
>    "user":"root",
>    "timestamp":1560885032,
>    "endTime":1560885040,
>    "edges":[  
>       {  
>          "sources":[  
>             1
>          ],
>          "targets":[  
>             0
>          ],
>          "edgeType":"PROJECTION"
>       },
>       {  
>          "sources":[  
>             3
>          ],
>          "targets":[  
>             2
>          ],
>          "edgeType":"PROJECTION"
>       }
>    ],
>    "vertices":[  
>       {  
>          "id":0,
>          "vertexType":"COLUMN",
>          "vertexId":"id",
>          "metadata":{  
>             "tableName":"sales_db.sales_china",
>             "tableCreateTime":1560885039
>          }
>       },
>       {  
>          "id":1,
>          "vertexType":"COLUMN",
>          "vertexId":"sales_db.sales_asia.id",
>          "metadata":{  
>             "tableName":"sales_db.sales_asia",
>             "tableCreateTime":1560884919
>          }
>       },
>       {  
>          "id":2,
>          "vertexType":"COLUMN",
>          "vertexId":"name",
>          "metadata":{  
>             "tableName":"sales_db.sales_china",
>             "tableCreateTime":1560885039
>          }
>       },
>       {  
>          "id":3,
>          "vertexType":"COLUMN",
>          "vertexId":"sales_db.sales_asia.name",
>          "metadata":{  
>             "tableName":"sales_db.sales_asia",
>             "tableCreateTime":1560884919
>          }
>       }
>    ]
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to