----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18995/#review38356 -----------------------------------------------------------
Can we change the title of jira to say "Support resuming of failed coordinator job and rerun of a failed coordinator action" to make it more clear as we are not trying to re-run the coordinator job. core/src/main/java/org/apache/oozie/command/coord/CoordChangeXCommand.java <https://reviews.apache.org/r/18995/#comment70447> E1015, jobStatus, "Only FAILED or KILLED job can be changed to RUNNING. Current job status is " + coordJob.getStatus()); core/src/main/java/org/apache/oozie/command/coord/CoordChangeXCommand.java <https://reviews.apache.org/r/18995/#comment70450> The parent bundle status and pending is not being updated. I don't think this will work when the coordinator is part of a bundle. docs/src/site/twiki/DG_CommandLineTool.twiki <https://reviews.apache.org/r/18995/#comment70456> Currently status only takes RUNNING and can be used to change the status of a coordinator job in FAILED or KILLED status back to RUNNING and resume materialization. This status change does not affect the status of already materialized actions in the coordinator. If there are FAILED or KILLED coordinator actions they will have to be rerun separately. - Rohini Palaniswamy On March 24, 2014, 5:42 p.m., Purshotam Shah wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/18995/ > ----------------------------------------------------------- > > (Updated March 24, 2014, 5:42 p.m.) > > > Review request for oozie. > > > Bugs: OOZIE-1735 > https://issues.apache.org/jira/browse/OOZIE-1735 > > > Repository: oozie-git > > > Description > ------- > > > We should support rerunning of failed job. Job are set to failed if there are > runtime error( like SQL timeout). > In current scenario there is no way to recover beside running SQL. > Rerun should set coord status to running and also set pending to 1 ,reset > doneMaterialization and last modified to current time. So that > materialization continues. > > We should also provide an option of resuming failed action. The behavior will > be same as killed option. > > > Diffs > ----- > > client/src/main/java/org/apache/oozie/cli/OozieCLI.java 87e2f27 > client/src/main/java/org/apache/oozie/client/OozieClient.java b0a85fd > core/src/main/java/org/apache/oozie/command/coord/CoordChangeXCommand.java > 4957330 > core/src/main/java/org/apache/oozie/command/coord/CoordRerunXCommand.java > 301737b > > core/src/test/java/org/apache/oozie/command/coord/TestCoordChangeXCommand.java > b9bbf16 > > core/src/test/java/org/apache/oozie/command/coord/TestCoordRerunXCommand.java > 3cee71a > docs/src/site/twiki/DG_CommandLineTool.twiki 0748ff8 > > Diff: https://reviews.apache.org/r/18995/diff/ > > > Testing > ------- > > > purushah$ ./oozie job -info 0000000-140324095133518-oozie-puru-C -oozie > http://localhost:11000/oozie > Job ID : 0000000-140324095133518-oozie-puru-C > ------------------------------------------------------------------------------------------------------------------------------------ > Job Name : aggregator-coord > App Path : > hdfs://localhost:9000/user/purushah/examples/apps/aggregator/coordinator.xml > Status : RUNNING > Start Time : 2010-01-01 01:00 GMT > End Time : 2010-01-01 03:00 GMT > Pause Time : - > Concurrency : 1 > ------------------------------------------------------------------------------------------------------------------------------------ > ID Status Ext ID > Err Code Created Nominal Time > 0000000-140324095133518-oozie-puru-C@1 KILLED > 0000001-140324095133518-oozie-puru-W - 2014-03-24 16:52 GMT > 2010-01-01 01:00 GMT > ------------------------------------------------------------------------------------------------------------------------------------ > 0000000-140324095133518-oozie-puru-C@2 KILLED > 0000002-140324095133518-oozie-puru-W - 2014-03-24 16:56 GMT > 2010-01-01 02:00 GMT > ------------------------------------------------------------------------------------------------------------------------------------ > purushah$ > purushah$ ./oozie job -kill 0000000-140324095133518-oozie-puru-C -oozie > http://localhost:11000/oozie > purushah$ ./oozie job -info 0000000-140324095133518-oozie-puru-C -oozie > http://localhost:11000/oozie > Job ID : 0000000-140324095133518-oozie-puru-C > ------------------------------------------------------------------------------------------------------------------------------------ > Job Name : aggregator-coord > App Path : > hdfs://localhost:9000/user/purushah/examples/apps/aggregator/coordinator.xml > Status : KILLED > Start Time : 2010-01-01 01:00 GMT > End Time : 2010-01-01 03:00 GMT > Pause Time : - > Concurrency : 1 > ------------------------------------------------------------------------------------------------------------------------------------ > ID Status Ext ID > Err Code Created Nominal Time > 0000000-140324095133518-oozie-puru-C@1 KILLED > 0000001-140324095133518-oozie-puru-W - 2014-03-24 16:52 GMT > 2010-01-01 01:00 GMT > ------------------------------------------------------------------------------------------------------------------------------------ > 0000000-140324095133518-oozie-puru-C@2 KILLED > 0000002-140324095133518-oozie-puru-W - 2014-03-24 16:56 GMT > 2010-01-01 02:00 GMT > ------------------------------------------------------------------------------------------------------------------------------------ > purushah$ ./oozie job -change 0000000-140324095133518-oozie-puru-C -value > status=RUNNING -oozie http://localhost:11000/oozie > purushah$ ./oozie job -info 0000000-140324095133518-oozie-puru-C -oozie > http://localhost:11000/oozie > Job ID : 0000000-140324095133518-oozie-puru-C > ------------------------------------------------------------------------------------------------------------------------------------ > Job Name : aggregator-coord > App Path : > hdfs://localhost:9000/user/purushah/examples/apps/aggregator/coordinator.xml > Status : RUNNING > Start Time : 2010-01-01 01:00 GMT > End Time : 2010-01-01 03:00 GMT > Pause Time : - > Concurrency : 1 > ------------------------------------------------------------------------------------------------------------------------------------ > ID Status Ext ID > Err Code Created Nominal Time > 0000000-140324095133518-oozie-puru-C@1 KILLED > 0000001-140324095133518-oozie-puru-W - 2014-03-24 16:52 GMT > 2010-01-01 01:00 GMT > ------------------------------------------------------------------------------------------------------------------------------------ > 0000000-140324095133518-oozie-puru-C@2 KILLED > 0000002-140324095133518-oozie-puru-W - 2014-03-24 16:56 GMT > 2010-01-01 02:00 GMT > ------------------------------------------------------------------------------------------------------------------------------------ > purushah$ > > > Thanks, > > Purshotam Shah > >
