[
https://issues.apache.org/jira/browse/GOBBLIN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apekshit Kumar updated GOBBLIN-1963:
------------------------------------
Description:
*Context:*
Following a restart, Gobblin service is currently unable to process previous
jobs in the RUNNING/LAUNCHED/SUBMITTED state, resulting in a stuck state for
these jobs.
*Example scenario mentioned here*
A job is in the LAUNCHED state, and while calculating CDC, the Application
master got re-attempted, actually due to name node issue (can be any env
issues).
!2ac4a7d8-f168-4877-a583-03d62f1384ac.png|width=1179,height=574!
As the job state in DB :
```
*{{mysql> select * from gobblin_job_queue where
job_name='DM-JOB-fpti-druid-dp-venmo' order by created_date desc limit 10;
+------------------------------------------+----------------------------+---------------+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------------------------------+---------------------+---------------------+
| queue_id | job_name | deployment_id | failure_exception | configs | status |
job_id | created_date | updated_date |
+------------------------------------------+----------------------------+---------------+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------------------------------+---------------------+---------------------+
| DM-JOB-fpti-druid-dp-venmo_1630444318758 | DM-JOB-fpti-druid-dp-venmo | 2 |
NULL |
\\{"dataset":{"batch_id":"20210831211155","name":"default._druid-test_dataproc-jobs_venmo","snapshot_id":"20210831211155"},"gobblin":\\{"client":{"id":"AIRFLOW_PAZ_DMP_DO"},"deployment":\\{"name":"DMP228"}},"namespace":"Chunnel"}
| LAUNCHED | job_DM-JOB-fpti-druid-dp-venmo_1630444325903 | 2021-08-31
21:12:00 | 2021-08-31 21:12:38 |}}*
*{{```}}*
3. Similarly other jobs got stuck :
```
*{{mysql> SELECT TIMEDIFF(SYSDATE(), q.updated_date), q.* FROM
gobblin_job_queue q WHERE q.status IN ('RUNNING', 'LAUNCHED', 'SUBMITTED') and
TIMEDIFF(SYSDATE(), q.updated_date) > '05:00:00' and q.deployment_id=2 order by
q.updated_date asc ;
Current database: dmpcloudpazdevdb
+-------------------------------------+---------------------------------------------------------+-------------------------------------------+---------------+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+-------------------------------------------------------------+---------------------+---------------------+
| TIMEDIFF(SYSDATE(), q.updated_date) | queue_id
| job_name | deployment_id |
failure_exception | configs
| status | job_id
| created_date | updated_date |
+-------------------------------------+---------------------------------------------------------+-------------------------------------------+---------------+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+-------------------------------------------------------------+---------------------+---------------------+
| 38:58:08 |
DM-JOB-EDW.IDI-ACCT-LOGIN-EVENT-INCREMENT_1630444082133 |
DM-JOB-EDW.IDI-ACCT-LOGIN-EVENT-INCREMENT | 2 | NULL |
\{"dataset":{"batch_id":"20210831210801","name":"edw.idi_acct_login_event_increment","snapshot_id":"20210831210801"},"gobblin":\{"client":{"id":"AIRFLOW_PAZ_DMP_DO"},"deployment":\{"name":"DMP228"}},"namespace":"Chunnel"}
| LAUNCHED |
job_DM-JOB-EDW.IDI-ACCT-LOGIN-EVENT-INCREMENT_1630444085787 | 2021-08-31
21:08:02 | 2021-08-31 21:08:52 |
| 38:56:57 | DM-JOB-fpti-ps-xoom-be-cs_1630444165958
| DM-JOB-fpti-ps-xoom-be-cs | 2 |
NULL |
\{"dataset":{"batch_id":"20210831210925","name":"default._sys_dt_fpti_polestar_xoom_be_cs","snapshot_id":"20210831210925"},"gobblin":\{"client":{"id":"AIRFLOW_PAZ_DMP_DO"},"deployment":\{"name":"DMP228"}},"namespace":"Chunnel"}
| LAUNCHED | job_DM-JOB-fpti-ps-xoom-be-cs_1630444176842 |
2021-08-31 21:09:25 | 2021-08-31 21:10:03 |
| 38:56:52 |
DM-JOB-EDW.IDI-ACCT-VISITOR-INCREMENT_1630444169541 |
DM-JOB-EDW.IDI-ACCT-VISITOR-INCREMENT | 2 | NULL |
\{"dataset":{"batch_id":"20210831210928","name":"edw.idi_acct_visitor_increment","snapshot_id":"20210831210928"},"gobblin":\{"client":{"id":"AIRFLOW_PAZ_DMP_DO"},"deployment":\{"name":"DMP228"}},"namespace":"Chunnel"}
| LAUNCHED |
job_DM-JOB-EDW.IDI-ACCT-VISITOR-INCREMENT_1630444176871 | 2021-08-31
21:09:29 | 2021-08-31 21:10:08 |
| 38:55:02 | DM-JOB-fpti-ps-xoom-be-ss_1630444285471
| DM-JOB-fpti-ps-xoom-be-ss | 2 |
NULL |
\{"dataset":{"batch_id":"20210831211124","name":"default._sys_dt_fpti_polestar_xoom_be_ss","snapshot_id":"20210831211124"},"gobblin":\{"client":{"id":"AIRFLOW_PAZ_DMP_DO"},"deployment":\{"name":"DMP228"}},"namespace":"Chunnel"}
| LAUNCHED | job_DM-JOB-fpti-ps-xoom-be-ss_1630444295836 |
2021-08-31 21:11:27 | 2021-08-31 21:11:58 |
| 38:54:22 |
DM-JOB-fpti-druid-dp-venmo_1630444318758 |
DM-JOB-fpti-druid-dp-venmo | 2 | NULL |
\{"dataset":{"batch_id":"20210831211155","name":"default._druid-test_dataproc-jobs_venmo","snapshot_id":"20210831211155"},"gobblin":\{"client":{"id":"AIRFLOW_PAZ_DMP_DO"},"deployment":\{"name":"DMP228"}},"namespace":"Chunnel"}
| LAUNCHED | job_DM-JOB-fpti-druid-dp-venmo_1630444325903 |
2021-08-31 21:12:00 | 2021-08-31 21:12:38 |
| 38:53:10 |
DM-JOB-fpti-ps-venmo-be-ss_1630444384062 |
DM-JOB-fpti-ps-venmo-be-ss | 2 | NULL |
\{"dataset":{"batch_id":"20210831211303","name":"default._sys_dt_fpti_polestar_venmo_be_ss","snapshot_id":"20210831211303"},"gobblin":\{"client":{"id":"AIRFLOW_PAZ_DMP_DO"},"deployment":\{"name":"DMP228"}},"namespace":"Chunnel"}
| LAUNCHED | job_DM-JOB-fpti-ps-venmo-be-ss_1630444385853 |
2021-08-31 21:13:04 | 2021-08-31 21:13:50 |
| 38:53:04 | DM-JOB-fpti-druid-dp-xoom_1630444365979
| DM-JOB-fpti-druid-dp-xoom | 2 |
NULL |
\{"dataset":{"batch_id":"20210831211245","name":"default._druid-test_dataproc-jobs_xoom","snapshot_id":"20210831211245"},"gobblin":\{"client":{"id":"AIRFLOW_PAZ_DMP_DO"},"deployment":\{"name":"DMP228"}},"namespace":"Chunnel"}
| LAUNCHED | job_DM-JOB-fpti-druid-dp-xoom_1630444375836 |
2021-08-31 21:12:46 | 2021-08-31 21:13:56 |
+-------------------------------------+---------------------------------------------------------+-------------------------------------------+---------------+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+-------------------------------------------------------------+---------------------+---------------------+
7 rows in set (0.58 sec)}}*
```
4. Lock files are not released :
```
*{{[05:11]:[pp_dmp_batch@lvspazetl227:~]$ hdfs dfs -ls -t -r
hdfs://stampy/apps/datapipeline/cloud/common/locks
Found 8 items
-rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:08
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-EDW.IDI-ACCT-LOGIN-EVENT-INCREMENT.lock
-rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:09
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-EDW.IDI-ACCT-VISITOR-INCREMENT.lock
-rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:09
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-xoom-be-cs.lock
-rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:11
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-xoom-be-ss.lock
-rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:12
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-venmo.lock
-rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:13
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-venmo-be-ss.lock
-rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:13
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-xoom.lock
-rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-09-02 05:09
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-muse-be-ss.lock
[05:12]:[pp_dmp_batch@lvspazetl227:~]$ hdfs dfs -ls -t -r
hdfs://stampy/apps/datapipeline/cloud/common/locks | grep "08-31"
-rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:08
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-EDW.IDI-ACCT-LOGIN-EVENT-INCREMENT.lock
-rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:09
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-EDW.IDI-ACCT-VISITOR-INCREMENT.lock
-rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:09
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-xoom-be-cs.lock
-rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:11
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-xoom-be-ss.lock
-rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:12
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-venmo.lock
-rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:13
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-venmo-be-ss.lock
-rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:13
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-xoom.lock
[05:12]:[pp_dmp_batch@lvspazetl227:~]$ hdfs dfs -ls -t -r
hdfs://stampy/apps/datapipeline/cloud/common/locks | grep "08-31" | awk
'\{print $8}'
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-EDW.IDI-ACCT-LOGIN-EVENT-INCREMENT.lock
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-EDW.IDI-ACCT-VISITOR-INCREMENT.lock
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-xoom-be-cs.lock
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-xoom-be-ss.lock
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-venmo.lock
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-venmo-be-ss.lock
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-xoom.lock
[[A[05:13]:[pp_dmp_batch@lvspazetl227:~]$ hdfs dfs -ls -t -r
hdfs://stampy/apps/datapipeline/cloud/common/locks | grep "08-31" | awk
'\{print $8}' | xargs hdfs dfs -rm -skipTrash
Deleted
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-EDW.IDI-ACCT-LOGIN-EVENT-INCREMENT.lock
Deleted
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-EDW.IDI-ACCT-VISITOR-INCREMENT.lock
Deleted
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-xoom-be-cs.lock
Deleted
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-xoom-be-ss.lock
Deleted
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-venmo.lock
Deleted
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-venmo-be-ss.lock
Deleted
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-xoom.lock
[05:13]:[pp_dmp_batch@lvspazetl227:~]$
[05:13]:[pp_dmp_batch@lvspazetl227:~]$
[05:13]:[pp_dmp_batch@lvspazetl227:~]$ hdfs dfs -ls -t -r
hdfs://stampy/apps/datapipeline/cloud/common/locks
Found 1 items
-rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-09-02 05:12
hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-muse.lock
[05:13]:[pp_dmp_batch@lvspazetl227:~]$}}*
*{{```}}*
*Acceptance Criteria:*
# Gobblin Jobs should be resumed, even if GobblinAppMaster gets restarted when
the Jobs are not finalized.
# The system should automatically resume jobs that were in the
RUNNING/LAUNCHED/SUBMITTED state after the restart.
# The solution should address lingering locks acquired in the previous run.
# Care should be taken to avoid picking up jobs or cleaning locks that are
currently being handled by other deployments as part of work stealing.
#
was:
*Context:*
Following a restart, Gobblin service is currently unable to process previous
jobs in the RUNNING/LAUNCHED/SUBMITTED state, resulting in a stuck state for
these jobs.
A job is in the LAUNCHED state, and while calculating CDC, the Application
master got re-attempted, actually due to name node issue (can be any env
issues).
*Acceptance Criteria:*
# The system should automatically resume jobs that were in the
RUNNING/LAUNCHED/SUBMITTED state after the restart.
# The solution should address lingering locks acquired in the previous run.
# Care should be taken to avoid picking up jobs or cleaning locks that are
currently being handled by other deployments as part of work stealing.
#
> Following the restart, jobs that were previously in the "RUNNING,"
> "LAUNCHED," or "SUBMITTED" state failed to resume.
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: GOBBLIN-1963
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1963
> Project: Apache Gobblin
> Issue Type: Bug
> Components: misc
> Affects Versions: 0.15.0
> Reporter: Apekshit Kumar
> Priority: Minor
> Attachments: 2ac4a7d8-f168-4877-a583-03d62f1384ac.png
>
>
> *Context:*
> Following a restart, Gobblin service is currently unable to process previous
> jobs in the RUNNING/LAUNCHED/SUBMITTED state, resulting in a stuck state for
> these jobs.
> *Example scenario mentioned here*
> A job is in the LAUNCHED state, and while calculating CDC, the Application
> master got re-attempted, actually due to name node issue (can be any env
> issues).
>
> !2ac4a7d8-f168-4877-a583-03d62f1384ac.png|width=1179,height=574!
>
> As the job state in DB :
> ```
> *{{mysql> select * from gobblin_job_queue where
> job_name='DM-JOB-fpti-druid-dp-venmo' order by created_date desc limit 10;
> +------------------------------------------+----------------------------+---------------+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------------------------------+---------------------+---------------------+
> | queue_id | job_name | deployment_id | failure_exception | configs | status
> | job_id | created_date | updated_date |
> +------------------------------------------+----------------------------+---------------+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------------------------------+---------------------+---------------------+
> | DM-JOB-fpti-druid-dp-venmo_1630444318758 | DM-JOB-fpti-druid-dp-venmo | 2 |
> NULL |
> \\{"dataset":{"batch_id":"20210831211155","name":"default._druid-test_dataproc-jobs_venmo","snapshot_id":"20210831211155"},"gobblin":\\{"client":{"id":"AIRFLOW_PAZ_DMP_DO"},"deployment":\\{"name":"DMP228"}},"namespace":"Chunnel"}
> | LAUNCHED | job_DM-JOB-fpti-druid-dp-venmo_1630444325903 | 2021-08-31
> 21:12:00 | 2021-08-31 21:12:38 |}}*
>
> *{{```}}*
> 3. Similarly other jobs got stuck :
> ```
> *{{mysql> SELECT TIMEDIFF(SYSDATE(), q.updated_date), q.* FROM
> gobblin_job_queue q WHERE q.status IN ('RUNNING', 'LAUNCHED', 'SUBMITTED')
> and TIMEDIFF(SYSDATE(), q.updated_date) > '05:00:00' and q.deployment_id=2
> order by q.updated_date asc ;
> Current database: dmpcloudpazdevdb
> +-------------------------------------+---------------------------------------------------------+-------------------------------------------+---------------+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+-------------------------------------------------------------+---------------------+---------------------+
> | TIMEDIFF(SYSDATE(), q.updated_date) | queue_id
> | job_name | deployment_id
> | failure_exception | configs
>
>
> | status | job_id
> | created_date | updated_date |
> +-------------------------------------+---------------------------------------------------------+-------------------------------------------+---------------+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+-------------------------------------------------------------+---------------------+---------------------+
> | 38:58:08 |
> DM-JOB-EDW.IDI-ACCT-LOGIN-EVENT-INCREMENT_1630444082133 |
> DM-JOB-EDW.IDI-ACCT-LOGIN-EVENT-INCREMENT | 2 | NULL
> |
> \{"dataset":{"batch_id":"20210831210801","name":"edw.idi_acct_login_event_increment","snapshot_id":"20210831210801"},"gobblin":\{"client":{"id":"AIRFLOW_PAZ_DMP_DO"},"deployment":\{"name":"DMP228"}},"namespace":"Chunnel"}
> | LAUNCHED |
> job_DM-JOB-EDW.IDI-ACCT-LOGIN-EVENT-INCREMENT_1630444085787 | 2021-08-31
> 21:08:02 | 2021-08-31 21:08:52 |
> | 38:56:57 |
> DM-JOB-fpti-ps-xoom-be-cs_1630444165958 |
> DM-JOB-fpti-ps-xoom-be-cs | 2 | NULL
> |
> \{"dataset":{"batch_id":"20210831210925","name":"default._sys_dt_fpti_polestar_xoom_be_cs","snapshot_id":"20210831210925"},"gobblin":\{"client":{"id":"AIRFLOW_PAZ_DMP_DO"},"deployment":\{"name":"DMP228"}},"namespace":"Chunnel"}
> | LAUNCHED | job_DM-JOB-fpti-ps-xoom-be-cs_1630444176842 |
> 2021-08-31 21:09:25 | 2021-08-31 21:10:03 |
> | 38:56:52 |
> DM-JOB-EDW.IDI-ACCT-VISITOR-INCREMENT_1630444169541 |
> DM-JOB-EDW.IDI-ACCT-VISITOR-INCREMENT | 2 | NULL
> |
> \{"dataset":{"batch_id":"20210831210928","name":"edw.idi_acct_visitor_increment","snapshot_id":"20210831210928"},"gobblin":\{"client":{"id":"AIRFLOW_PAZ_DMP_DO"},"deployment":\{"name":"DMP228"}},"namespace":"Chunnel"}
> | LAUNCHED |
> job_DM-JOB-EDW.IDI-ACCT-VISITOR-INCREMENT_1630444176871 | 2021-08-31
> 21:09:29 | 2021-08-31 21:10:08 |
> | 38:55:02 |
> DM-JOB-fpti-ps-xoom-be-ss_1630444285471 |
> DM-JOB-fpti-ps-xoom-be-ss | 2 | NULL
> |
> \{"dataset":{"batch_id":"20210831211124","name":"default._sys_dt_fpti_polestar_xoom_be_ss","snapshot_id":"20210831211124"},"gobblin":\{"client":{"id":"AIRFLOW_PAZ_DMP_DO"},"deployment":\{"name":"DMP228"}},"namespace":"Chunnel"}
> | LAUNCHED | job_DM-JOB-fpti-ps-xoom-be-ss_1630444295836 |
> 2021-08-31 21:11:27 | 2021-08-31 21:11:58 |
> | 38:54:22 |
> DM-JOB-fpti-druid-dp-venmo_1630444318758 |
> DM-JOB-fpti-druid-dp-venmo | 2 | NULL
> |
> \{"dataset":{"batch_id":"20210831211155","name":"default._druid-test_dataproc-jobs_venmo","snapshot_id":"20210831211155"},"gobblin":\{"client":{"id":"AIRFLOW_PAZ_DMP_DO"},"deployment":\{"name":"DMP228"}},"namespace":"Chunnel"}
> | LAUNCHED | job_DM-JOB-fpti-druid-dp-venmo_1630444325903 |
> 2021-08-31 21:12:00 | 2021-08-31 21:12:38 |
> | 38:53:10 |
> DM-JOB-fpti-ps-venmo-be-ss_1630444384062 |
> DM-JOB-fpti-ps-venmo-be-ss | 2 | NULL
> |
> \{"dataset":{"batch_id":"20210831211303","name":"default._sys_dt_fpti_polestar_venmo_be_ss","snapshot_id":"20210831211303"},"gobblin":\{"client":{"id":"AIRFLOW_PAZ_DMP_DO"},"deployment":\{"name":"DMP228"}},"namespace":"Chunnel"}
> | LAUNCHED | job_DM-JOB-fpti-ps-venmo-be-ss_1630444385853 |
> 2021-08-31 21:13:04 | 2021-08-31 21:13:50 |
> | 38:53:04 |
> DM-JOB-fpti-druid-dp-xoom_1630444365979 |
> DM-JOB-fpti-druid-dp-xoom | 2 | NULL
> |
> \{"dataset":{"batch_id":"20210831211245","name":"default._druid-test_dataproc-jobs_xoom","snapshot_id":"20210831211245"},"gobblin":\{"client":{"id":"AIRFLOW_PAZ_DMP_DO"},"deployment":\{"name":"DMP228"}},"namespace":"Chunnel"}
> | LAUNCHED | job_DM-JOB-fpti-druid-dp-xoom_1630444375836
> | 2021-08-31 21:12:46 | 2021-08-31 21:13:56 |
> +-------------------------------------+---------------------------------------------------------+-------------------------------------------+---------------+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+-------------------------------------------------------------+---------------------+---------------------+
> 7 rows in set (0.58 sec)}}*
> ```
> 4. Lock files are not released :
> ```
> *{{[05:11]:[pp_dmp_batch@lvspazetl227:~]$ hdfs dfs -ls -t -r
> hdfs://stampy/apps/datapipeline/cloud/common/locks
> Found 8 items
> -rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:08
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-EDW.IDI-ACCT-LOGIN-EVENT-INCREMENT.lock
> -rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:09
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-EDW.IDI-ACCT-VISITOR-INCREMENT.lock
> -rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:09
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-xoom-be-cs.lock
> -rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:11
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-xoom-be-ss.lock
> -rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:12
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-venmo.lock
> -rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:13
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-venmo-be-ss.lock
> -rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:13
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-xoom.lock
> -rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-09-02 05:09
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-muse-be-ss.lock
> [05:12]:[pp_dmp_batch@lvspazetl227:~]$ hdfs dfs -ls -t -r
> hdfs://stampy/apps/datapipeline/cloud/common/locks | grep "08-31"
> -rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:08
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-EDW.IDI-ACCT-LOGIN-EVENT-INCREMENT.lock
> -rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:09
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-EDW.IDI-ACCT-VISITOR-INCREMENT.lock
> -rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:09
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-xoom-be-cs.lock
> -rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:11
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-xoom-be-ss.lock
> -rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:12
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-venmo.lock
> -rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:13
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-venmo-be-ss.lock
> -rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-08-31 14:13
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-xoom.lock
> [05:12]:[pp_dmp_batch@lvspazetl227:~]$ hdfs dfs -ls -t -r
> hdfs://stampy/apps/datapipeline/cloud/common/locks | grep "08-31" | awk
> '\{print $8}'
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-EDW.IDI-ACCT-LOGIN-EVENT-INCREMENT.lock
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-EDW.IDI-ACCT-VISITOR-INCREMENT.lock
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-xoom-be-cs.lock
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-xoom-be-ss.lock
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-venmo.lock
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-venmo-be-ss.lock
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-xoom.lock
> [[A[05:13]:[pp_dmp_batch@lvspazetl227:~]$ hdfs dfs -ls -t -r
> hdfs://stampy/apps/datapipeline/cloud/common/locks | grep "08-31" | awk
> '\{print $8}' | xargs hdfs dfs -rm -skipTrash
> Deleted
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-EDW.IDI-ACCT-LOGIN-EVENT-INCREMENT.lock
> Deleted
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-EDW.IDI-ACCT-VISITOR-INCREMENT.lock
> Deleted
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-xoom-be-cs.lock
> Deleted
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-xoom-be-ss.lock
> Deleted
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-venmo.lock
> Deleted
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-ps-venmo-be-ss.lock
> Deleted
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-xoom.lock
> [05:13]:[pp_dmp_batch@lvspazetl227:~]$
> [05:13]:[pp_dmp_batch@lvspazetl227:~]$
> [05:13]:[pp_dmp_batch@lvspazetl227:~]$ hdfs dfs -ls -t -r
> hdfs://stampy/apps/datapipeline/cloud/common/locks
> Found 1 items
> -rw-r--r-- 3 pp_dmp_batch pp_dmp_batch 0 2021-09-02 05:12
> hdfs://stampy/apps/datapipeline/cloud/common/locks/DM-JOB-fpti-druid-dp-muse.lock
> [05:13]:[pp_dmp_batch@lvspazetl227:~]$}}*
>
>
> *{{```}}*
> *Acceptance Criteria:*
> # Gobblin Jobs should be resumed, even if GobblinAppMaster gets restarted
> when the Jobs are not finalized.
> # The system should automatically resume jobs that were in the
> RUNNING/LAUNCHED/SUBMITTED state after the restart.
> # The solution should address lingering locks acquired in the previous run.
> # Care should be taken to avoid picking up jobs or cleaning locks that are
> currently being handled by other deployments as part of work stealing.
> #
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)