[ 
https://issues.apache.org/jira/browse/ATLAS-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dharshana M Krishnamoorthy updated ATLAS-4595:
----------------------------------------------
    Description: 
Scenario:

use --filename in the import script in along with --output so that v2 api is 
invoked 

Eg:
{code:java}
export JAVA_HOME=/usr/java/default; 
/opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename 
/tmp/file_tejqc.txt --output /tmp/db_okgbi.zip{code}
Steps:
 # Create 2 databases db_1 and db_2
 # Create 2 tables under each db
 # Run import using filename that has database db_1 name 

The import was success, but the entities are not reflected in atlas
{code:java}
2022-04-28 10:50:52,693|INFO|MainThread|machine.py:185 - 
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|RUNNING: ssh -l root -i 
/tmp/hw-qe-keypair.pem -q -o StrictHostKeyChecking=no -o 
UserKnownHostsFile=/dev/null quasar-jagdkt-5.quasar-jagdkt.root.hwx.site "sudo 
-u root sh -c 'export JAVA_HOME=/usr/java/default; 
/opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename 
/tmp/file_tejqc.txt --output /tmp/db_okgbi.zip'" 2022-04-28 
10:50:52,957|INFO|MainThread|machine.py:200 - 
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Using Hive configuration 
directory [/etc/hive/conf] 2022-04-28 
10:50:53,152|INFO|MainThread|machine.py:200 - 
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|/etc/hive/conf:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/./:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/.//*
 2022-04-28 10:50:53,152|INFO|MainThread|machine.py:200 - 
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Log file for import is 
/var/log/atlas/import-hive.log 2022-04-28 
10:50:55,328|INFO|MainThread|machine.py:200 - 
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|log4j:WARN No such property 
[maxFileSize] in org.apache.log4j.PatternLayout. 2022-04-28 
10:50:55,329|INFO|MainThread|machine.py:200 - 
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|log4j:WARN No such property 
[maxBackupIndex] in org.apache.log4j.PatternLayout. 2022-04-28 
10:51:18,889|INFO|MainThread|machine.py:200 - 
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: An illegal reflective 
access operation has occurred 2022-04-28 
10:51:18,890|INFO|MainThread|machine.py:200 - 
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Illegal reflective 
access by org.apache.hadoop.hive.common.StringInternUtils 
(file:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/jars/hive-exec-3.1.3000.7.1.8.0-581.jar)
 to field java.net.URI.string 2022-04-28 
10:51:18,890|INFO|MainThread|machine.py:200 - 
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Please consider 
reporting this to the maintainers of 
org.apache.hadoop.hive.common.StringInternUtils 2022-04-28 
10:51:18,890|INFO|MainThread|machine.py:200 - 
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Use 
--illegal-access=warn to enable warnings of further illegal reflective access 
operations 2022-04-28 10:51:18,891|INFO|MainThread|machine.py:200 - 
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: All illegal access 
operations will be denied in a future release 2022-04-28 
10:51:20,824|INFO|MainThread|machine.py:200 - 
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Hive Meta Data imported 
successfully! 2022-04-28 10:51:20,850|INFO|MainThread|machine.py:227 - 
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Exit Code: 0 {code}
 

Additional details: file_tejqc.txt file content
{code:java}
cat /tmp/file_tejqc.txt
db_hive_db_dumeh {code}
Tables in the db:
{code:java}
0: jdbc:hive2://quasar-jagdkt-1.quasar-jagdkt> use db_hive_db_dumeh;
INFO  : Compiling 
command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463): use 
db_hive_db_dumeh
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling 
command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463); Time 
taken: 0.016 seconds
INFO  : Executing 
command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463): use 
db_hive_db_dumeh
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing 
command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463); Time 
taken: 0.007 seconds
INFO  : OK
No rows affected (0.036 seconds)
0: jdbc:hive2://quasar-jagdkt-1.quasar-jagdkt> show tables;
INFO  : Compiling 
command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314): show 
tables
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, 
type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314); Time 
taken: 0.134 seconds
INFO  : Executing 
command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314): show 
tables
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing 
command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314); Time 
taken: 0.015 seconds
INFO  : OK
+-----------+
| tab_name  |
+-----------+
| table_1   |
| table_2   |
+-----------+
2 rows selected (0.688 seconds) {code}
 

  was:
Scenario:

use --filename in the import script in along with --output so that v2 api is 
invoked 

Eg:
{code:java}
'/opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename 
/tmp/file_hqavs.txt --output /tmp/db_axmqv.zip {code}
There is some delay (few seconds) before it reflects in atlas.

Steps:
 # Create 2 databases db_1 and db_2
 # Run import using filename that has tables belonging to database1

When a search is performed immediately after the import, the data is not 
reflected in atlas, if we wait for 5 seconds and then search again, data is 
reflected.

This does not happen in the following scenarios:
 # when v1 api is used
 # when v2 api is used with database name
 # when v2 api is used with table name

*It happens only when v2 api is used along with file name*

This is not a blocker bug as the data reflects in atlas.

But creating to find the reason why this happens only while using file name in 
v2 api.

 

 

        Summary: [Hive import v2]When using file name to import via v2 api, the 
entities are not reflected in atlas though the import is successful  (was: 
[Hive import v2] [Performance]When using file name to import via v2 api, there 
is some delay before the entities are reflected in atlas)

> [Hive import v2]When using file name to import via v2 api, the entities are 
> not reflected in atlas though the import is successful
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ATLAS-4595
>                 URL: https://issues.apache.org/jira/browse/ATLAS-4595
>             Project: Atlas
>          Issue Type: Bug
>          Components:  atlas-core
>            Reporter: Dharshana M Krishnamoorthy
>            Priority: Major
>
> Scenario:
> use --filename in the import script in along with --output so that v2 api is 
> invoked 
> Eg:
> {code:java}
> export JAVA_HOME=/usr/java/default; 
> /opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename 
> /tmp/file_tejqc.txt --output /tmp/db_okgbi.zip{code}
> Steps:
>  # Create 2 databases db_1 and db_2
>  # Create 2 tables under each db
>  # Run import using filename that has database db_1 name 
> The import was success, but the entities are not reflected in atlas
> {code:java}
> 2022-04-28 10:50:52,693|INFO|MainThread|machine.py:185 - 
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|RUNNING: ssh -l root -i 
> /tmp/hw-qe-keypair.pem -q -o StrictHostKeyChecking=no -o 
> UserKnownHostsFile=/dev/null quasar-jagdkt-5.quasar-jagdkt.root.hwx.site 
> "sudo -u root sh -c 'export JAVA_HOME=/usr/java/default; 
> /opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename 
> /tmp/file_tejqc.txt --output /tmp/db_okgbi.zip'" 2022-04-28 
> 10:50:52,957|INFO|MainThread|machine.py:200 - 
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Using Hive configuration 
> directory [/etc/hive/conf] 2022-04-28 
> 10:50:53,152|INFO|MainThread|machine.py:200 - 
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|/etc/hive/conf:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/./:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/.//*
>  2022-04-28 10:50:53,152|INFO|MainThread|machine.py:200 - 
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Log file for import is 
> /var/log/atlas/import-hive.log 2022-04-28 
> 10:50:55,328|INFO|MainThread|machine.py:200 - 
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|log4j:WARN No such property 
> [maxFileSize] in org.apache.log4j.PatternLayout. 2022-04-28 
> 10:50:55,329|INFO|MainThread|machine.py:200 - 
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|log4j:WARN No such property 
> [maxBackupIndex] in org.apache.log4j.PatternLayout. 2022-04-28 
> 10:51:18,889|INFO|MainThread|machine.py:200 - 
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: An illegal 
> reflective access operation has occurred 2022-04-28 
> 10:51:18,890|INFO|MainThread|machine.py:200 - 
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Illegal reflective 
> access by org.apache.hadoop.hive.common.StringInternUtils 
> (file:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/jars/hive-exec-3.1.3000.7.1.8.0-581.jar)
>  to field java.net.URI.string 2022-04-28 
> 10:51:18,890|INFO|MainThread|machine.py:200 - 
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Please consider 
> reporting this to the maintainers of 
> org.apache.hadoop.hive.common.StringInternUtils 2022-04-28 
> 10:51:18,890|INFO|MainThread|machine.py:200 - 
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Use 
> --illegal-access=warn to enable warnings of further illegal reflective access 
> operations 2022-04-28 10:51:18,891|INFO|MainThread|machine.py:200 - 
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: All illegal access 
> operations will be denied in a future release 2022-04-28 
> 10:51:20,824|INFO|MainThread|machine.py:200 - 
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Hive Meta Data imported 
> successfully! 2022-04-28 10:51:20,850|INFO|MainThread|machine.py:227 - 
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Exit Code: 0 {code}
>  
> Additional details: file_tejqc.txt file content
> {code:java}
> cat /tmp/file_tejqc.txt
> db_hive_db_dumeh {code}
> Tables in the db:
> {code:java}
> 0: jdbc:hive2://quasar-jagdkt-1.quasar-jagdkt> use db_hive_db_dumeh;
> INFO  : Compiling 
> command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463): 
> use db_hive_db_dumeh
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463); 
> Time taken: 0.016 seconds
> INFO  : Executing 
> command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463): 
> use db_hive_db_dumeh
> INFO  : Starting task [Stage-0:DDL] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463); 
> Time taken: 0.007 seconds
> INFO  : OK
> No rows affected (0.036 seconds)
> 0: jdbc:hive2://quasar-jagdkt-1.quasar-jagdkt> show tables;
> INFO  : Compiling 
> command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314): 
> show tables
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, 
> type:string, comment:from deserializer)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314); 
> Time taken: 0.134 seconds
> INFO  : Executing 
> command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314): 
> show tables
> INFO  : Starting task [Stage-0:DDL] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314); 
> Time taken: 0.015 seconds
> INFO  : OK
> +-----------+
> | tab_name  |
> +-----------+
> | table_1   |
> | table_2   |
> +-----------+
> 2 rows selected (0.688 seconds) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to