[ 
https://issues.apache.org/jira/browse/OOZIE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829743#comment-13829743
 ] 

Shwetha G S commented on OOZIE-1622:
------------------------------------

Looks like this code is the issue in RecoveryService.runBundleRecovery():
{noformat}
                    if (baction.getCoordId() == null) {
                        log.error("CoordId is null for Bundle action " + 
baction.getBundleActionId());
                        continue;
                    }
                    if 
(Services.get().get(JobsConcurrencyService.class).isJobIdForThisServer(baction.getCoordId()))
 {
                        if (baction.getStatus() == Job.Status.PREP) {
                            BundleJobBean bundleJob = null;
                            if (jpaService != null) {
                                bundleJob = 
BundleJobQueryExecutor.getInstance().get(
                                        
BundleJobQuery.GET_BUNDLE_JOB_ID_JOBXML_CONF, baction.getBundleId());
                            }
                            if (bundleJob != null) {
                                Element bAppXml = 
XmlUtils.parseXml(bundleJob.getJobXml());
                                List<Element> coordElems = 
bAppXml.getChildren("coordinator", bAppXml.getNamespace());
                                for (Element coordElem : coordElems) {
                                    Attribute name = 
coordElem.getAttribute("name");
                                    if 
(name.getValue().equals(baction.getCoordName())) {
                                        Configuration coordConf = 
mergeConfig(coordElem, bundleJob);
                                        coordConf.set(OozieClient.BUNDLE_ID, 
baction.getBundleId());
                                        queueCallable(new 
CoordSubmitXCommand(coordConf,
                                                bundleJob.getId(), 
name.getValue()));
                                    }
                                }
                            }
                        }
{noformat}
This code creates new coord if the bundle action is in PREP and is pending and 
coord id is not null. Coord id not null means that coord is already created. 
Why create another one?

In this particular instance, bundle action was in pending state as bundle job 
was suspended, and coord was still in PREP state as materialisation didn't 
happen(I don't have debug logs, so don't know why materialisation didn't happen 
for 17 mins(which is another weird scenario which needs to be debugged)). 
Because coord action was in PREP state, I guess even bundle action was in PEP 
state. But this is a valid scenario to happen



> Multiple CoordSubmit for same bundle
> ------------------------------------
>
>                 Key: OOZIE-1622
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1622
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Shwetha G S
>            Priority: Critical
>
> We saw a weird instance where multiple coords were created for same bundle id 
> when the bundle was supposed to have just 1 coordinator. Here are the oozie 
> logs:
> {noformat}
> 2013-11-19 09:09:46,473  INFO BundleStartXCommand:539 - USER[fetl] GROUP[-] 
> TOKEN[] APP[<app name>] JOB[0484436-131016085136608-oozie-oozi-B] ACTION[-] 
> Bundle 0484436-131016085136608-oozie-oozi-B is not in PREP status. It is in : 
> RUNNING
> 2013-11-19 09:09:46,473  WARN BundleStartXCommand:542 - USER[fetl] GROUP[-] 
> TOKEN[] APP[<app name>] JOB[0484436-131016085136608-oozie-oozi-B] ACTION[-] 
> E1100: Command precondition does not hold before execution, [Bundle 
> 0484436-131016085136608-oozie-oozi-B is not in PREP status. It is in : 
> RUNNING], Error Code: E1100
> 2013-11-19 09:09:46,473  INFO CoordSubmitXCommand:539 - USER[-] GROUP[-] 
> TOKEN[-] APP[-] JOB[0484436-131016085136608-oozie-oozi-B] ACTION[-] STARTED 
> Coordinator Submit
> 2013-11-19 09:09:46,483  INFO CoordSubmitXCommand:539 - USER[-] GROUP[-] 
> TOKEN[-] APP[-] JOB[0484436-131016085136608-oozie-oozi-B] ACTION[-] 
> configDefault Doesn't exist 
> 2013-11-19 09:09:46,515  INFO CoordSubmitXCommand:539 - USER[fetl] GROUP[-] 
> TOKEN[] APP[<app name>] JOB[0484437-131016085136608-oozie-oozi-C] ACTION[-] 
> ENDED Coordinator Submit jobId=0484437-131016085136608-oozie-oozi-C
> 2013-11-19 09:09:46,529  INFO BundleStatusUpdateXCommand:539 - USER[fetl] 
> GROUP[-] TOKEN[] APP[<app name>] JOB[0484437-131016085136608-oozie-oozi-C] 
> ACTION[-] Updated bundle action [0484436-131016085136608-oozie-oozi-B_<app 
> name>] from prev status [PREP] to current coord status [PREP], and new bundle 
> action pending [0]
> 2013-11-19 09:09:46,535  INFO CoordMaterializeTransitionXCommand:539 - 
> USER[fetl] GROUP[-] TOKEN[] APP[<app name>] 
> JOB[0484437-131016085136608-oozie-oozi-C] ACTION[-] materialize actions for 
> tz=Coordinated Universal Time,
> 2013-11-19 09:09:54,590  INFO StatusTransitService$StatusTransitRunnable:539 
> - USER[-] GROUP[-] Set bundle job [0484436-131016085136608-oozie-oozi-B] 
> status to 'RUNNING' from 'RUNNING'
> 2013-11-19 09:09:54,590  INFO StatusTransitService$StatusTransitRunnable:539 
> - USER[-] GROUP[-] Bundle job [0484436-131016085136608-oozie-oozi-B] Pending 
> set to FALSE
> 2013-11-19 09:10:16,326  INFO 
> CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:539 - USER[-] 
> GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Job 
> :0484437-131016085136608-oozie-oozi-C  numWaitingActions : 0 MatThrottle : 60
> 2013-11-19 09:12:57,246  INFO StatusTransitService$StatusTransitRunnable:539 
> - USER[-] GROUP[-] Set bundle job [0484436-131016085136608-oozie-oozi-B] 
> status to 'SUSPENDED' from 'SUSPENDED'
> 2013-11-19 09:12:57,246  INFO StatusTransitService$StatusTransitRunnable:539 
> - USER[-] GROUP[-] Bundle job [0484436-131016085136608-oozie-oozi-B] Pending 
> set to TRUE
> 2013-11-19 09:13:16,410  INFO 
> CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:539 - USER[-] 
> GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Job 
> :0484437-131016085136608-oozie-oozi-C  numWaitingActions : 0 MatThrottle : 60
> 2013-11-19 09:16:16,446  INFO 
> CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:539 - USER[-] 
> GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Job 
> :0484437-131016085136608-oozie-oozi-C  numWaitingActions : 0 MatThrottle : 60
> 2013-11-19 09:17:00,913  INFO StatusTransitService$StatusTransitRunnable:539 
> - USER[-] GROUP[-] Set bundle job [0484436-131016085136608-oozie-oozi-B] 
> status to 'RUNNING' from 'RUNNING'
> 2013-11-19 09:17:00,914  INFO StatusTransitService$StatusTransitRunnable:539 
> - USER[-] GROUP[-] Bundle job [0484436-131016085136608-oozie-oozi-B] Pending 
> set to TRUE
> 2013-11-19 09:19:16,490  INFO 
> CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:539 - USER[-] 
> GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Job 
> :0484437-131016085136608-oozie-oozi-C  numWaitingActions : 0 MatThrottle : 60
> 2013-11-19 09:22:16,907  INFO 
> CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:539 - USER[-] 
> GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Job 
> :0484437-131016085136608-oozie-oozi-C  numWaitingActions : 0 MatThrottle : 60
> 2013-11-19 09:25:17,086  INFO 
> CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:539 - USER[-] 
> GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Job 
> :0484437-131016085136608-oozie-oozi-C  numWaitingActions : 0 MatThrottle : 60
> 2013-11-19 09:26:49,373  INFO CoordSubmitXCommand:539 - USER[-] GROUP[-] 
> TOKEN[-] APP[-] JOB[0484436-131016085136608-oozie-oozi-B] ACTION[-] STARTED 
> Coordinator Submit
> 2013-11-19 09:26:49,383  INFO CoordSubmitXCommand:539 - USER[-] GROUP[-] 
> TOKEN[-] APP[-] JOB[0484436-131016085136608-oozie-oozi-B] ACTION[-] 
> configDefault Doesn't exist
> 2013-11-19 09:26:49,438  INFO CoordSubmitXCommand:539 - USER[fetl] GROUP[-] 
> TOKEN[] APP[<app name>] JOB[0484598-131016085136608-oozie-oozi-C] ACTION[-] 
> ENDED Coordinator Submit jobId=0484598-131016085136608-oozie-oozi-C
> 2013-11-19 09:26:49,445  INFO BundleStatusUpdateXCommand:539 - USER[fetl] 
> GROUP[-] TOKEN[] APP[<app name>] JOB[0484598-131016085136608-oozie-oozi-C] 
> ACTION[-] Updated bundle action [0484436-131016085136608-oozie-oozi-B_<app 
> name>] from prev status [PREP] to current coord status [PREP], and new bundle 
> action pending [1]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to