[
https://issues.apache.org/jira/browse/ATLAS-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
chaitali borole updated ATLAS-4903:
-----------------------------------
Description:
Here in type hive_process_execution we have "guid":
"a2fc8760-8906-454c-8ad8-23b1fffa7fdb", "typeName": "hive_process"
show below:
"entity": {
"typeName": "hive_process_execution",
"attributes": {
"hostName": "",
"qualifiedName":
"cm:6135:db_hive_mig_hive.db_hive_mig_hive_tbl_00@cm:1698112413000:6155:1698112413000:1698112463140",
"name":
"cm:6135:db_hive_mig_hive.db_hive_mig_hive_tbl_00@cm:1698112413000:6155:1698112413000:1698112463140",
"queryText": "insert into
db_hive_mig_hive.db_hive_mig_hive_tbl_00...2023-10-24T01:53:33.000Z",
"startTime": 1698112413000,
"queryPlan": "Not Supported",
"endTime": 1698112463140,
"userName":
"mailto:hive/quasar-cpvvvn-1.quasar-cpvvvn.root.hwx.s...@qe-infra-ad.cloudera.com",
"queryId": "",
"owner": null,
"displayName": null,
"description": null,
"userDescription": null
},
"guid": "b98ef015-6bd9-4343-ad85-24628aa76731",
"isIncomplete": false,
"provenanceType": 0,
"status": "ACTIVE",
"createTime": 1698112413000,
"updateTime": 1698112413000,
"version": 0,
"relationshipAttributes": {
"process": {
"guid": "a2fc8760-8906-454c-8ad8-23b1fffa7fdb",
"typeName": "hive_process"
}
},
"customAttributes": {
"__nav_engineType": "\"MR\""
},
"businessAttributes": {},
"proxy": false
}
But the entity with "guid": "a2fc8760-8906-454c-8ad8-23b1fffa7fdb" doesnt have
above hive_process_execution in "processExecutions": [] block
Hence relationship edge would get created when process_execution processed but
before that when it tries to process hive_process and finds the
relationshipattribute is empty and assu es the edges are unused, further tries
to delete the edges
When huge migration data is restarted it is seen that the deleted entities
count keeps accumulating due to above issue causing migration to slow down alot
and take more time to process data than expected
was:
When Atlas is in migration mode add a flag to handle multiple rerun of
migration in case interrupted for any reason
During migration when interrupted and same file is rerun currently all the
entities go through comparison of new and old edges resulting into deletion of
edges and vertices in DB
> When migration restarts it results into deletion of edges and vertices
> -----------------------------------------------------------------------
>
> Key: ATLAS-4903
> URL: https://issues.apache.org/jira/browse/ATLAS-4903
> Project: Atlas
> Issue Type: Improvement
> Affects Versions: 3.0.0
> Reporter: chaitali borole
> Assignee: chaitali borole
> Priority: Major
>
> Here in type hive_process_execution we have "guid":
> "a2fc8760-8906-454c-8ad8-23b1fffa7fdb", "typeName": "hive_process"
> show below:
> "entity": {
> "typeName": "hive_process_execution",
> "attributes": {
> "hostName": "",
> "qualifiedName":
> "cm:6135:db_hive_mig_hive.db_hive_mig_hive_tbl_00@cm:1698112413000:6155:1698112413000:1698112463140",
> "name":
> "cm:6135:db_hive_mig_hive.db_hive_mig_hive_tbl_00@cm:1698112413000:6155:1698112413000:1698112463140",
> "queryText": "insert into
> db_hive_mig_hive.db_hive_mig_hive_tbl_00...2023-10-24T01:53:33.000Z",
> "startTime": 1698112413000,
> "queryPlan": "Not Supported",
> "endTime": 1698112463140,
> "userName":
> "mailto:hive/quasar-cpvvvn-1.quasar-cpvvvn.root.hwx.s...@qe-infra-ad.cloudera.com",
> "queryId": "",
> "owner": null,
> "displayName": null,
> "description": null,
> "userDescription": null
> },
> "guid": "b98ef015-6bd9-4343-ad85-24628aa76731",
> "isIncomplete": false,
> "provenanceType": 0,
> "status": "ACTIVE",
> "createTime": 1698112413000,
> "updateTime": 1698112413000,
> "version": 0,
> "relationshipAttributes": {
> "process": {
> "guid": "a2fc8760-8906-454c-8ad8-23b1fffa7fdb",
> "typeName": "hive_process"
> }
> },
> "customAttributes": {
> "__nav_engineType": "\"MR\""
> },
> "businessAttributes": {},
> "proxy": false
> }
> But the entity with "guid": "a2fc8760-8906-454c-8ad8-23b1fffa7fdb" doesnt
> have above hive_process_execution in "processExecutions": [] block
> Hence relationship edge would get created when process_execution processed
> but before that when it tries to process hive_process and finds the
> relationshipattribute is empty and assu es the edges are unused, further
> tries to delete the edges
> When huge migration data is restarted it is seen that the deleted entities
> count keeps accumulating due to above issue causing migration to slow down
> alot and take more time to process data than expected
--
This message was sent by Atlassian Jira
(v8.20.10#820010)