[ 
https://issues.apache.org/jira/browse/ATLAS-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chaitali borole updated ATLAS-4903:
-----------------------------------
    Description: 
Here in type hive_process_execution we have  "guid": 
"a2fc8760-8906-454c-8ad8-23b1fffa7fdb", "typeName": "hive_process"

show below: 
"entity": {
            "typeName": "hive_process_execution",
            "attributes": {
                "hostName": "",
                "qualifiedName": 
"cm:6135:db_hive_mig_hive.db_hive_mig_hive_tbl_00@cm:1698112413000:6155:1698112413000:1698112463140",
                "name": 
"cm:6135:db_hive_mig_hive.db_hive_mig_hive_tbl_00@cm:1698112413000:6155:1698112413000:1698112463140",
                "queryText": "insert into 
db_hive_mig_hive.db_hive_mig_hive_tbl_00...2023-10-24T01:53:33.000Z",
                "startTime": 1698112413000,
                "queryPlan": "Not Supported",
                "endTime": 1698112463140,
                "userName": 
"mailto:hive/quasar-cpvvvn-1.quasar-cpvvvn.root.hwx.s...@qe-infra-ad.cloudera.com";,
                "queryId": "",
                "owner": null,
                "displayName": null,
                "description": null,
                "userDescription": null
            },
            "guid": "b98ef015-6bd9-4343-ad85-24628aa76731",
            "isIncomplete": false,
            "provenanceType": 0,
            "status": "ACTIVE",
            "createTime": 1698112413000,
            "updateTime": 1698112413000,
            "version": 0,
            "relationshipAttributes": {
                "process": {
                    "guid": "a2fc8760-8906-454c-8ad8-23b1fffa7fdb",
                    "typeName": "hive_process"
                }
            },
            "customAttributes": {
                "__nav_engineType": "\"MR\""
            },
            "businessAttributes": {},
            "proxy": false
        }

But the  entity with "guid": "a2fc8760-8906-454c-8ad8-23b1fffa7fdb" doesnt have 
above hive_process_execution in  "processExecutions": [] block

Hence relationship edge would get created when process_execution  processed but 
before that when it tries to  process hive_process and finds the 
relationshipattribute is empty and assu es the edges are unused, further tries 
to delete the  edges 

When huge migration data is restarted it is seen that the deleted entities 
count keeps accumulating due to above issue causing migration to slow down alot 
and take more time to process data than expected

  was:
When Atlas is in migration mode add a flag to handle multiple rerun of 
migration in case interrupted for any reason
During migration when interrupted and same file is rerun currently all the 
entities go through comparison of new and old edges resulting into deletion of 
edges and vertices in DB 



>  When migration restarts it results into deletion of edges and vertices
> -----------------------------------------------------------------------
>
>                 Key: ATLAS-4903
>                 URL: https://issues.apache.org/jira/browse/ATLAS-4903
>             Project: Atlas
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: chaitali borole
>            Assignee: chaitali borole
>            Priority: Major
>
> Here in type hive_process_execution we have  "guid": 
> "a2fc8760-8906-454c-8ad8-23b1fffa7fdb", "typeName": "hive_process"
> show below: 
> "entity": {
>             "typeName": "hive_process_execution",
>             "attributes": {
>                 "hostName": "",
>                 "qualifiedName": 
> "cm:6135:db_hive_mig_hive.db_hive_mig_hive_tbl_00@cm:1698112413000:6155:1698112413000:1698112463140",
>                 "name": 
> "cm:6135:db_hive_mig_hive.db_hive_mig_hive_tbl_00@cm:1698112413000:6155:1698112413000:1698112463140",
>                 "queryText": "insert into 
> db_hive_mig_hive.db_hive_mig_hive_tbl_00...2023-10-24T01:53:33.000Z",
>                 "startTime": 1698112413000,
>                 "queryPlan": "Not Supported",
>                 "endTime": 1698112463140,
>                 "userName": 
> "mailto:hive/quasar-cpvvvn-1.quasar-cpvvvn.root.hwx.s...@qe-infra-ad.cloudera.com";,
>                 "queryId": "",
>                 "owner": null,
>                 "displayName": null,
>                 "description": null,
>                 "userDescription": null
>             },
>             "guid": "b98ef015-6bd9-4343-ad85-24628aa76731",
>             "isIncomplete": false,
>             "provenanceType": 0,
>             "status": "ACTIVE",
>             "createTime": 1698112413000,
>             "updateTime": 1698112413000,
>             "version": 0,
>             "relationshipAttributes": {
>                 "process": {
>                     "guid": "a2fc8760-8906-454c-8ad8-23b1fffa7fdb",
>                     "typeName": "hive_process"
>                 }
>             },
>             "customAttributes": {
>                 "__nav_engineType": "\"MR\""
>             },
>             "businessAttributes": {},
>             "proxy": false
>         }
> But the  entity with "guid": "a2fc8760-8906-454c-8ad8-23b1fffa7fdb" doesnt 
> have above hive_process_execution in  "processExecutions": [] block
> Hence relationship edge would get created when process_execution  processed 
> but before that when it tries to  process hive_process and finds the 
> relationshipattribute is empty and assu es the edges are unused, further 
> tries to delete the  edges 
> When huge migration data is restarted it is seen that the deleted entities 
> count keeps accumulating due to above issue causing migration to slow down 
> alot and take more time to process data than expected



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to