[
https://issues.apache.org/jira/browse/OOZIE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152626#comment-14152626
]
Hadoop QA commented on OOZIE-1940:
----------------------------------
Testing JIRA OOZIE-1940
Cleaning local git workspace
----------------------------
{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
. {color:green}+1{color} the patch does not introduce any @author tags
. {color:green}+1{color} the patch does not introduce any tabs
. {color:green}+1{color} the patch does not introduce any trailing spaces
. {color:red}-1{color} the patch contains 4 line(s) longer than 132
characters
. {color:green}+1{color} the patch does adds/modifies 3 testcase(s)
{color:green}+1 RAT{color}
. {color:green}+1{color} the patch does not seem to introduce new RAT
warnings
{color:green}+1 JAVADOC{color}
. {color:green}+1{color} the patch does not seem to introduce new Javadoc
warnings
{color:green}+1 COMPILE{color}
. {color:green}+1{color} HEAD compiles
. {color:green}+1{color} patch compiles
. {color:green}+1{color} the patch does not seem to introduce new javac
warnings
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
. {color:green}+1{color} the patch does not change any JPA
Entity/Colum/Basic/Lob/Transient annotations
. {color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
. Tests run: 1535
{color:green}+1 DISTRO{color}
. {color:green}+1{color} distro tarball builds with the patch
----------------------------
{color:red}*-1 Overall result, please check the reported -1(s)*{color}
The full output of the test-patch run is available at
. https://builds.apache.org/job/oozie-trunk-precommit-build/2009/
> StatusTransitService has race condition
> ---------------------------------------
>
> Key: OOZIE-1940
> URL: https://issues.apache.org/jira/browse/OOZIE-1940
> Project: Oozie
> Issue Type: Bug
> Reporter: Purshotam Shah
> Assignee: Purshotam Shah
> Attachments: OOZIE-1940-V5.patch, OOZIE-1940-V6.patch,
> OOZIE-1940-V7.patch, OOZIE-1940-V8.patch
>
>
> StatusTransitService doesn't acquire lock while updating DB.
> We noticed one such issue while doing HA testing, thanks to [~mchiang]
> We issue a change command to change pause time, which got executed on one
> server. While change command was running on one server, other server started
> executing StatusTransitService.
> Server 1 log
> {code}
> 2014-07-16 17:28:05,268 INFO StatusTransitService$StatusTransitRunnable:539
> [pool-1-thread-13] - USER[-] GROUP[-] Acquired lock for
> [org.apache.oozie.service.StatusTransitService]
> 2014-07-16 17:28:09,694 INFO StatusTransitService$StatusTransitRunnable:539
> [pool-1-thread-13] - USER[-] GROUP[-] Set coordinator job
> [0011385-140716042555-oozie-oozi-C] status to 'SUCCEEDED' from 'RUNNING'
> 2014-07-16 17:28:15,416 INFO StatusTransitService$StatusTransitRunnable:539
> [pool-1-thread-13] - USER[-] GROUP[-] Released lock for
> [org.apache.oozie.service.StatusTransitService]
> {code}
> Server 2 log
> {code}
> 2014-07-16 17:28:06,499 DEBUG CoordChangeXCommand:545 [http-0.0.0.0-4443-5] -
> USER[hadoopqa] GROUP[users] TOKEN[] APP[coordB180]
> JOB[0011385-140716042555-oozie-oozi-C] ACTION[-] New pause/end date is : Wed
> Jul 16 17:30:00 UTC 2014 and last action number is : 3
> 2014-07-16 17:28:06,508 INFO CoordChangeXCommand:539 [http-0.0.0.0-4443-5] -
> USER[hadoopqa] GROUP[users] TOKEN[] APP[coordB180]
> JOB[0011385-140716042555-oozie-oozi-C] ACTION[-] ENDED CoordChangeXCommand
> for jobId=0011385-140716042555-oozie-oozi-C
> {code}
> CoordMaterializeTransitionXCommand has created all actions( few were in
> waiting and few were in running state) and set doneMaterialization to true.
> Change command deletes all waiting coords, except 3 running/SUCCEEDED action
> and reset doneMaterialization.
> StatusTransitService first loads a set of pending jobs and for each job it
> make DB calls to check coord action status. Coord jobs are loaded only once
> in beginning.
> This is what happened.
> 1.StatusTransitService loads the coord job which doneMaterialization is set
> to true at 17:28:05,268 (server 1)
> 2.Change command deletes waiting cation and reset doneMaterialization at
> 17:28:06,508 (server 2)
> 3.StatusTransitService load actions for job, only 3 and in SUCCEEDED status.
> It never reload the doneMaterialization at 17:28:09,694 (server 1)
> StatusTransitService overrides set job status to SUCCEEDED, bcz it's
> doneMaterialization and all action are SUCCEEDED.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)