[
https://issues.apache.org/jira/browse/ATLAS-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Na Li updated ATLAS-3290:
-------------------------
Description:
The column name in Impala lineage record may not contain its database name and
its table name.
To get its its database name and its table name, we should use the metadata in
a vertex, not assuming column name contains its database name and its table
name.
When assuming that column name always contains its database name and its table
name, we run into the following exception
{code}
I0618 19:16:02.415920 209817 QueryEventHookManager.java:212] Initiating
onQueryComplete: org.apache.atlas.impala.hook.ImpalaLineageHook
E0618 19:16:02.418964 210738 ImpalaLineageHook.java:126]
ImpalaLineageHook.process(): failed to process query create table sales_sg as
select * from sales_asia
Java exception follows:
java.lang.IllegalArgumentException: fullColumnName {} does not contain database
name or table name
at
org.apache.atlas.impala.hook.AtlasImpalaHookContext.getQualifiedNameForColumn(AtlasImpalaHookContext.java:115)
at
org.apache.atlas.impala.hook.events.BaseImpalaEvent.getQualifiedName(BaseImpalaEvent.java:164)
at
org.apache.atlas.impala.hook.events.BaseImpalaEvent.getQualifiedName(BaseImpalaEvent.java:134)
at
org.apache.atlas.impala.hook.events.BaseImpalaEvent.getColumnEntities(BaseImpalaEvent.java:495)
at
org.apache.atlas.impala.hook.events.BaseImpalaEvent.toTableEntity(BaseImpalaEvent.java:430)
at
org.apache.atlas.impala.hook.events.BaseImpalaEvent.toTableEntity(BaseImpalaEvent.java:393)
at
org.apache.atlas.impala.hook.events.BaseImpalaEvent.toAtlasEntity(BaseImpalaEvent.java:315)
at
org.apache.atlas.impala.hook.events.BaseImpalaEvent.getInputOutputEntity(BaseImpalaEvent.java:297)
at
org.apache.atlas.impala.hook.events.CreateImpalaProcess.getEntities(CreateImpalaProcess.java:103)
at
org.apache.atlas.impala.hook.events.CreateImpalaProcess.getNotificationMessages(CreateImpalaProcess.java:54)
at
org.apache.atlas.impala.hook.ImpalaLineageHook.process(ImpalaLineageHook.java:122)
at
org.apache.atlas.impala.hook.ImpalaLineageHook.process(ImpalaLineageHook.java:79)
at
org.apache.atlas.impala.hook.ImpalaHook.onQueryComplete(ImpalaHook.java:36)
at
org.apache.atlas.impala.hook.ImpalaLineageHook.onQueryComplete(ImpalaLineageHook.java:52)
at
org.apache.impala.hooks.QueryEventHookManager.lambda$null$1(QueryEventHookManager.java:215)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
The lineage record from Impala is
{code}
{
"queryText":"create table sales_china as select * from sales_asia",
"queryId":"2940d0b242de53ea:e82ba8d300000000",
"hash":"a705a9ec851a5440afca0dfb8df86cd5",
"user":"root",
"timestamp":1560885032,
"endTime":1560885040,
"edges":[
{
"sources":[
1
],
"targets":[
0
],
"edgeType":"PROJECTION"
},
{
"sources":[
3
],
"targets":[
2
],
"edgeType":"PROJECTION"
}
],
"vertices":[
{
"id":0,
"vertexType":"COLUMN",
"vertexId":"id",
"metadata":{
"tableName":"sales_db.sales_china",
"tableCreateTime":1560885039
}
},
{
"id":1,
"vertexType":"COLUMN",
"vertexId":"sales_db.sales_asia.id",
"metadata":{
"tableName":"sales_db.sales_asia",
"tableCreateTime":1560884919
}
},
{
"id":2,
"vertexType":"COLUMN",
"vertexId":"name",
"metadata":{
"tableName":"sales_db.sales_china",
"tableCreateTime":1560885039
}
},
{
"id":3,
"vertexType":"COLUMN",
"vertexId":"sales_db.sales_asia.name",
"metadata":{
"tableName":"sales_db.sales_asia",
"tableCreateTime":1560884919
}
}
]
}
{code}
was:
The column name in Impala lineage record may not contain its database name and
its table name.
To get its its database name and its table name, we should use the metadata in
a vertex, not assuming column name contains its database name and its table
name.
> Impala Hook: Get database name and table name from vertex metadata
> ------------------------------------------------------------------
>
> Key: ATLAS-3290
> URL: https://issues.apache.org/jira/browse/ATLAS-3290
> Project: Atlas
> Issue Type: New Feature
> Components: atlas-core
> Affects Versions: 2.1.0
> Reporter: Na Li
> Assignee: Na Li
> Priority: Major
>
> The column name in Impala lineage record may not contain its database name
> and its table name.
> To get its its database name and its table name, we should use the metadata
> in a vertex, not assuming column name contains its database name and its
> table name.
> When assuming that column name always contains its database name and its
> table name, we run into the following exception
> {code}
> I0618 19:16:02.415920 209817 QueryEventHookManager.java:212] Initiating
> onQueryComplete: org.apache.atlas.impala.hook.ImpalaLineageHook
> E0618 19:16:02.418964 210738 ImpalaLineageHook.java:126]
> ImpalaLineageHook.process(): failed to process query create table sales_sg as
> select * from sales_asia
> Java exception follows:
> java.lang.IllegalArgumentException: fullColumnName {} does not contain
> database name or table name
> at
> org.apache.atlas.impala.hook.AtlasImpalaHookContext.getQualifiedNameForColumn(AtlasImpalaHookContext.java:115)
> at
> org.apache.atlas.impala.hook.events.BaseImpalaEvent.getQualifiedName(BaseImpalaEvent.java:164)
> at
> org.apache.atlas.impala.hook.events.BaseImpalaEvent.getQualifiedName(BaseImpalaEvent.java:134)
> at
> org.apache.atlas.impala.hook.events.BaseImpalaEvent.getColumnEntities(BaseImpalaEvent.java:495)
> at
> org.apache.atlas.impala.hook.events.BaseImpalaEvent.toTableEntity(BaseImpalaEvent.java:430)
> at
> org.apache.atlas.impala.hook.events.BaseImpalaEvent.toTableEntity(BaseImpalaEvent.java:393)
> at
> org.apache.atlas.impala.hook.events.BaseImpalaEvent.toAtlasEntity(BaseImpalaEvent.java:315)
> at
> org.apache.atlas.impala.hook.events.BaseImpalaEvent.getInputOutputEntity(BaseImpalaEvent.java:297)
> at
> org.apache.atlas.impala.hook.events.CreateImpalaProcess.getEntities(CreateImpalaProcess.java:103)
> at
> org.apache.atlas.impala.hook.events.CreateImpalaProcess.getNotificationMessages(CreateImpalaProcess.java:54)
> at
> org.apache.atlas.impala.hook.ImpalaLineageHook.process(ImpalaLineageHook.java:122)
> at
> org.apache.atlas.impala.hook.ImpalaLineageHook.process(ImpalaLineageHook.java:79)
> at
> org.apache.atlas.impala.hook.ImpalaHook.onQueryComplete(ImpalaHook.java:36)
> at
> org.apache.atlas.impala.hook.ImpalaLineageHook.onQueryComplete(ImpalaLineageHook.java:52)
> at
> org.apache.impala.hooks.QueryEventHookManager.lambda$null$1(QueryEventHookManager.java:215)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> The lineage record from Impala is
> {code}
> {
> "queryText":"create table sales_china as select * from sales_asia",
> "queryId":"2940d0b242de53ea:e82ba8d300000000",
> "hash":"a705a9ec851a5440afca0dfb8df86cd5",
> "user":"root",
> "timestamp":1560885032,
> "endTime":1560885040,
> "edges":[
> {
> "sources":[
> 1
> ],
> "targets":[
> 0
> ],
> "edgeType":"PROJECTION"
> },
> {
> "sources":[
> 3
> ],
> "targets":[
> 2
> ],
> "edgeType":"PROJECTION"
> }
> ],
> "vertices":[
> {
> "id":0,
> "vertexType":"COLUMN",
> "vertexId":"id",
> "metadata":{
> "tableName":"sales_db.sales_china",
> "tableCreateTime":1560885039
> }
> },
> {
> "id":1,
> "vertexType":"COLUMN",
> "vertexId":"sales_db.sales_asia.id",
> "metadata":{
> "tableName":"sales_db.sales_asia",
> "tableCreateTime":1560884919
> }
> },
> {
> "id":2,
> "vertexType":"COLUMN",
> "vertexId":"name",
> "metadata":{
> "tableName":"sales_db.sales_china",
> "tableCreateTime":1560885039
> }
> },
> {
> "id":3,
> "vertexType":"COLUMN",
> "vertexId":"sales_db.sales_asia.name",
> "metadata":{
> "tableName":"sales_db.sales_asia",
> "tableCreateTime":1560884919
> }
> }
> ]
> }
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)