[ 
https://issues.apache.org/jira/browse/OOZIE-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15314859#comment-15314859
 ] 

Purshotam Shah commented on OOZIE-2522:
---------------------------------------

bq. Won't this cause commands to be requeued repeatedly in a loop if there is 
an issue for a longer time and cause queue overflow?
Yes. There are two ways to handle this.
1. If there is lock issue, it's better to requeue the command, so that a soon 
as the problem is fixed, Oozie can continue running with further delay.
2. Don't queue the command and let recovery service picks it up.

According to me, option 1 is better because of less impact on SLA and if there 
is a lock issue, no service will run and queue size may remain same. And if 
there is overflow it will be picked by  recovery service, which is same as 
option - 2.


> There can be multiple coord submit from bundle in case of ZK glitch
> -------------------------------------------------------------------
>
>                 Key: OOZIE-2522
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2522
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Purshotam Shah
>            Assignee: Purshotam Shah
>         Attachments: OOZIE-2522-V1.patch, OOZIE-2522-V2.patch, 
> OOZIE-2522-V3.patch
>
>
> Bundle queue coord submit command to create coord job. 
> CoordSubmitXCommand doesn't acquire any lock. CoordSubmitXCommand inserts 
> entries to DB and calls BundleStatusUpdateXCommand to update bundle action. 
> At the same time, if there is any ZK glitch, BundleStatusUpdateXCommand will 
> fail ( because it needs to acquire lock) and RecoveryService will submit 
> duplicate jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to