[
https://issues.apache.org/jira/browse/ATLAS-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dharshana M Krishnamoorthy updated ATLAS-4595:
----------------------------------------------
Description:
Scenario:
use --filename in the import script in along with --output so that v2 api is
invoked
Eg:
{code:java}
export JAVA_HOME=/usr/java/default;
/opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename
/tmp/file_tejqc.txt --output /tmp/db_okgbi.zip{code}
Steps:
# Create 2 databases db_1 and db_2
# Create 2 tables under each db
# Run import using filename that has database db_1 name
The import was success, but the entities are not reflected in atlas
{code:java}
2022-04-28 10:50:52,693|INFO|MainThread|machine.py:185 -
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|RUNNING: ssh -l root -i
/tmp/hw-qe-keypair.pem -q -o StrictHostKeyChecking=no -o
UserKnownHostsFile=/dev/null quasar-jagdkt-5.quasar-jagdkt.root.hwx.site "sudo
-u root sh -c 'export JAVA_HOME=/usr/java/default;
/opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename
/tmp/file_tejqc.txt --output /tmp/db_okgbi.zip'" 2022-04-28
10:50:52,957|INFO|MainThread|machine.py:200 -
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Using Hive configuration
directory [/etc/hive/conf] 2022-04-28
10:50:53,152|INFO|MainThread|machine.py:200 -
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|/etc/hive/conf:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/./:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/.//*
2022-04-28 10:50:53,152|INFO|MainThread|machine.py:200 -
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Log file for import is
/var/log/atlas/import-hive.log 2022-04-28
10:50:55,328|INFO|MainThread|machine.py:200 -
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|log4j:WARN No such property
[maxFileSize] in org.apache.log4j.PatternLayout. 2022-04-28
10:50:55,329|INFO|MainThread|machine.py:200 -
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|log4j:WARN No such property
[maxBackupIndex] in org.apache.log4j.PatternLayout. 2022-04-28
10:51:18,889|INFO|MainThread|machine.py:200 -
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: An illegal reflective
access operation has occurred 2022-04-28
10:51:18,890|INFO|MainThread|machine.py:200 -
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Illegal reflective
access by org.apache.hadoop.hive.common.StringInternUtils
(file:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/jars/hive-exec-3.1.3000.7.1.8.0-581.jar)
to field java.net.URI.string 2022-04-28
10:51:18,890|INFO|MainThread|machine.py:200 -
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Please consider
reporting this to the maintainers of
org.apache.hadoop.hive.common.StringInternUtils 2022-04-28
10:51:18,890|INFO|MainThread|machine.py:200 -
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Use
--illegal-access=warn to enable warnings of further illegal reflective access
operations 2022-04-28 10:51:18,891|INFO|MainThread|machine.py:200 -
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: All illegal access
operations will be denied in a future release 2022-04-28
10:51:20,824|INFO|MainThread|machine.py:200 -
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Hive Meta Data imported
successfully! 2022-04-28 10:51:20,850|INFO|MainThread|machine.py:227 -
run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Exit Code: 0 {code}
Additional details: file_tejqc.txt file content
{code:java}
cat /tmp/file_tejqc.txt
db_hive_db_dumeh {code}
Tables in the db:
{code:java}
0: jdbc:hive2://quasar-jagdkt-1.quasar-jagdkt> use db_hive_db_dumeh;
INFO : Compiling
command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463): use
db_hive_db_dumeh
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling
command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463); Time
taken: 0.016 seconds
INFO : Executing
command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463): use
db_hive_db_dumeh
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing
command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463); Time
taken: 0.007 seconds
INFO : OK
No rows affected (0.036 seconds)
0: jdbc:hive2://quasar-jagdkt-1.quasar-jagdkt> show tables;
INFO : Compiling
command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314): show
tables
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name,
type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling
command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314); Time
taken: 0.134 seconds
INFO : Executing
command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314): show
tables
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing
command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314); Time
taken: 0.015 seconds
INFO : OK
+-----------+
| tab_name |
+-----------+
| table_1 |
| table_2 |
+-----------+
2 rows selected (0.688 seconds) {code}
was:
Scenario:
use --filename in the import script in along with --output so that v2 api is
invoked
Eg:
{code:java}
'/opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename
/tmp/file_hqavs.txt --output /tmp/db_axmqv.zip {code}
There is some delay (few seconds) before it reflects in atlas.
Steps:
# Create 2 databases db_1 and db_2
# Run import using filename that has tables belonging to database1
When a search is performed immediately after the import, the data is not
reflected in atlas, if we wait for 5 seconds and then search again, data is
reflected.
This does not happen in the following scenarios:
# when v1 api is used
# when v2 api is used with database name
# when v2 api is used with table name
*It happens only when v2 api is used along with file name*
This is not a blocker bug as the data reflects in atlas.
But creating to find the reason why this happens only while using file name in
v2 api.
Summary: [Hive import v2]When using file name to import via v2 api, the
entities are not reflected in atlas though the import is successful (was:
[Hive import v2] [Performance]When using file name to import via v2 api, there
is some delay before the entities are reflected in atlas)
> [Hive import v2]When using file name to import via v2 api, the entities are
> not reflected in atlas though the import is successful
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: ATLAS-4595
> URL: https://issues.apache.org/jira/browse/ATLAS-4595
> Project: Atlas
> Issue Type: Bug
> Components: atlas-core
> Reporter: Dharshana M Krishnamoorthy
> Priority: Major
>
> Scenario:
> use --filename in the import script in along with --output so that v2 api is
> invoked
> Eg:
> {code:java}
> export JAVA_HOME=/usr/java/default;
> /opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename
> /tmp/file_tejqc.txt --output /tmp/db_okgbi.zip{code}
> Steps:
> # Create 2 databases db_1 and db_2
> # Create 2 tables under each db
> # Run import using filename that has database db_1 name
> The import was success, but the entities are not reflected in atlas
> {code:java}
> 2022-04-28 10:50:52,693|INFO|MainThread|machine.py:185 -
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|RUNNING: ssh -l root -i
> /tmp/hw-qe-keypair.pem -q -o StrictHostKeyChecking=no -o
> UserKnownHostsFile=/dev/null quasar-jagdkt-5.quasar-jagdkt.root.hwx.site
> "sudo -u root sh -c 'export JAVA_HOME=/usr/java/default;
> /opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename
> /tmp/file_tejqc.txt --output /tmp/db_okgbi.zip'" 2022-04-28
> 10:50:52,957|INFO|MainThread|machine.py:200 -
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Using Hive configuration
> directory [/etc/hive/conf] 2022-04-28
> 10:50:53,152|INFO|MainThread|machine.py:200 -
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|/etc/hive/conf:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/./:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/.//*
> 2022-04-28 10:50:53,152|INFO|MainThread|machine.py:200 -
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Log file for import is
> /var/log/atlas/import-hive.log 2022-04-28
> 10:50:55,328|INFO|MainThread|machine.py:200 -
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|log4j:WARN No such property
> [maxFileSize] in org.apache.log4j.PatternLayout. 2022-04-28
> 10:50:55,329|INFO|MainThread|machine.py:200 -
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|log4j:WARN No such property
> [maxBackupIndex] in org.apache.log4j.PatternLayout. 2022-04-28
> 10:51:18,889|INFO|MainThread|machine.py:200 -
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: An illegal
> reflective access operation has occurred 2022-04-28
> 10:51:18,890|INFO|MainThread|machine.py:200 -
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Illegal reflective
> access by org.apache.hadoop.hive.common.StringInternUtils
> (file:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/jars/hive-exec-3.1.3000.7.1.8.0-581.jar)
> to field java.net.URI.string 2022-04-28
> 10:51:18,890|INFO|MainThread|machine.py:200 -
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Please consider
> reporting this to the maintainers of
> org.apache.hadoop.hive.common.StringInternUtils 2022-04-28
> 10:51:18,890|INFO|MainThread|machine.py:200 -
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Use
> --illegal-access=warn to enable warnings of further illegal reflective access
> operations 2022-04-28 10:51:18,891|INFO|MainThread|machine.py:200 -
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: All illegal access
> operations will be denied in a future release 2022-04-28
> 10:51:20,824|INFO|MainThread|machine.py:200 -
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Hive Meta Data imported
> successfully! 2022-04-28 10:51:20,850|INFO|MainThread|machine.py:227 -
> run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Exit Code: 0 {code}
>
> Additional details: file_tejqc.txt file content
> {code:java}
> cat /tmp/file_tejqc.txt
> db_hive_db_dumeh {code}
> Tables in the db:
> {code:java}
> 0: jdbc:hive2://quasar-jagdkt-1.quasar-jagdkt> use db_hive_db_dumeh;
> INFO : Compiling
> command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463):
> use db_hive_db_dumeh
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO : Completed compiling
> command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463);
> Time taken: 0.016 seconds
> INFO : Executing
> command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463):
> use db_hive_db_dumeh
> INFO : Starting task [Stage-0:DDL] in serial mode
> INFO : Completed executing
> command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463);
> Time taken: 0.007 seconds
> INFO : OK
> No rows affected (0.036 seconds)
> 0: jdbc:hive2://quasar-jagdkt-1.quasar-jagdkt> show tables;
> INFO : Compiling
> command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314):
> show tables
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name,
> type:string, comment:from deserializer)], properties:null)
> INFO : Completed compiling
> command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314);
> Time taken: 0.134 seconds
> INFO : Executing
> command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314):
> show tables
> INFO : Starting task [Stage-0:DDL] in serial mode
> INFO : Completed executing
> command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314);
> Time taken: 0.015 seconds
> INFO : OK
> +-----------+
> | tab_name |
> +-----------+
> | table_1 |
> | table_2 |
> +-----------+
> 2 rows selected (0.688 seconds) {code}
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)