[
https://issues.apache.org/jira/browse/OOZIE-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15314859#comment-15314859
]
Purshotam Shah commented on OOZIE-2522:
---------------------------------------
bq. Won't this cause commands to be requeued repeatedly in a loop if there is
an issue for a longer time and cause queue overflow?
Yes. There are two ways to handle this.
1. If there is lock issue, it's better to requeue the command, so that a soon
as the problem is fixed, Oozie can continue running with further delay.
2. Don't queue the command and let recovery service picks it up.
According to me, option 1 is better because of less impact on SLA and if there
is a lock issue, no service will run and queue size may remain same. And if
there is overflow it will be picked by recovery service, which is same as
option - 2.
> There can be multiple coord submit from bundle in case of ZK glitch
> -------------------------------------------------------------------
>
> Key: OOZIE-2522
> URL: https://issues.apache.org/jira/browse/OOZIE-2522
> Project: Oozie
> Issue Type: Bug
> Reporter: Purshotam Shah
> Assignee: Purshotam Shah
> Attachments: OOZIE-2522-V1.patch, OOZIE-2522-V2.patch,
> OOZIE-2522-V3.patch
>
>
> Bundle queue coord submit command to create coord job.
> CoordSubmitXCommand doesn't acquire any lock. CoordSubmitXCommand inserts
> entries to DB and calls BundleStatusUpdateXCommand to update bundle action.
> At the same time, if there is any ZK glitch, BundleStatusUpdateXCommand will
> fail ( because it needs to acquire lock) and RecoveryService will submit
> duplicate jobs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)