[
https://issues.apache.org/jira/browse/GOBBLIN-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apekshit Kumar updated GOBBLIN-1965:
------------------------------------
Description:
*Context :*
was:
*Context :*
Currently due to NN availability issues, acquire job lock is failing, because
of which job fails.
{code:java}
select deployment_id, status, count(*) from gobblin_job_queue where
created_date >= '2021-09-01' and created_date < '2021-10-01' and
failure_exception like '%NullPointerException%' group by deployment_id, status
order by deployment_id, status;
+---------------+--------+----------+
| deployment_id | status | count(*) |
+---------------+--------+----------+
| 1 | FAILED | 253 |
| 2 | FAILED | 6 |
| 230 | FAILED | 157 |
| 22702 | FAILED | 11 |
| 22703 | FAILED | 13 |
| 22704 | FAILED | 2 |
+---------------+--------+----------+
6 rows in set (1.04 sec)
mysql> select deployment_id, status, count(*) from gobblin_job_queue where
created_date >= '2021-08-01' and created_date < '2021-09-01' and
failure_exception like '%NullPointerException%' group by deployment_id, status
order by deployment_id, status;
+---------------+--------+----------+
| deployment_id | status | count(*) |
+---------------+--------+----------+
| 1 | FAILED | 1091 |
| 3 | FAILED | 1598 |
| 230 | FAILED | 15870 |
+---------------+--------+----------+
3 rows in set (1.18 sec)
{code}
*Acceptance Criteria:*
Job lock acquisition to be made resilient to NN issues, probably by moving
locks to Zk or retrying while acquiring lock, in case of NN issues
(IOExceptions)@
> Extending Hive data movement CDC check to support table regex lookup
> --------------------------------------------------------------------
>
> Key: GOBBLIN-1965
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1965
> Project: Apache Gobblin
> Issue Type: Bug
> Components: misc
> Affects Versions: 0.15.0
> Reporter: Apekshit Kumar
> Priority: Minor
>
> *Context :*
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)