Sharmadha Sainath created ATLAS-1948:
----------------------------------------

             Summary: Importing hive_table in a database which is a CTAS of 
another table in different database throws exception due to export order.
                 Key: ATLAS-1948
                 URL: https://issues.apache.org/jira/browse/ATLAS-1948
             Project: Atlas
          Issue Type: Bug
          Components:  atlas-core
    Affects Versions: 0.9-incubating
            Reporter: Sharmadha Sainath
            Priority: Critical
             Fix For: 0.9-incubating
         Attachments: ImportTransformsErrorOnCTASonDiffDB.txt

1.Created 2 databases db1 , db2 in cluster1
2.Created 2 tables
      1. db1.t1
      2. db2.t2 as select * from db1.t1
3.Exported db1.t1 into zip file.
4.Imported zip file into cluster 2 with transforms option :
{code}
{
  "options": {
   "transforms": "{ \"hive_column\": { \"qualifiedName\": [ \"replace:cl1:cl2\" 
]} }"
  }
}
{code}
5. Import fails with 
{code}
{"errorCode":"ATLAS-500-00-001","errorMessage":"org.apache.atlas.exception.AtlasBaseException:
 ObjectId is not valid 
AtlasObjectId{guid='51c77c1e-265e-46ab-bbb5-5316cf80a53c', 
typeName='hive_column', uniqueAttributes={}}"}
{code}

Only db1.t1 is imported into Atlas without any lineage. 

Attached the exception stack trace.

After this exporting db2.t2 and importing completes successfully.
That is , first import ,either db1.t1 or db2.t1 is unsuccessful with exception. 
Next import is successful.

The exception *doesn't* happen and tables are successfully imported If both the 
tables are in a single database. Export order if tables are in same db is 
1.table1, 
2.db,
3.table2, 
4.hive_process
5. hive_column_lineage

If the tables are in different db , the order is ,
1.table1, 
2.db1,
3.hive_process,
4.hive_column_lineage
5.ctas table
6.db2  
which is possibly causing the issue. 

When cluster2  starts importing , it imports table1 , db1 and when it comes to 
hive_column_lineage , it finds that column specified in hive_column_lineage is 
not in cluster2 yet ,since ctas table comes after the hive_column_lineage in 
import order and it throws "ObjectId is not valid 
AtlasObjectId{guid='51c77c1e-265e-46ab-bbb5-5316cf80a53c', 
typeName='hive_column' ".

Thanks [~ayubkhan] for the analysis.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to