-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29298/
-----------------------------------------------------------
Review request for Ambari, Alejandro Fernandez and Nate Cole.
Bugs: AMBARI-8852
https://issues.apache.org/jira/browse/AMBARI-8852
Repository: ambari
Description
-------
During RU, a failure occurred on "Client Components" group, "Service Check
HBASE, MAPREDUCE2, HDFS, YARN" item.
The UI presented me with a Retry button. However, the server rejected this
request:
PUT /api/v1/clusters/ysru2/upgrades/5/upgrade_groups/4/upgrade_items/30
{"UpgradeItem":{"status":"PENDING"}}
{
"status" : 400,
"message" : "java.lang.IllegalArgumentException: Can not transition a stage
from FAILED to PENDING"
}
I believe this is the current expected behavior since the failure is not marked
to hold.
However, on any service check failure, the user should be able to retry (or
maybe on any failure? actions should be idempotent).
----
Allow Retry - mark a stage (upgrade item) to allow any failed task to be
retried. This means that if a failure occurs during the execution of the task
then the stage & task will transition to HOLDING_FAILED. Once in the
HOLDING_FAILED state, the stage can be pushed to PENDING (retry) or FAILED.
Transitioning the stage to FAILED will cause the remaining tasks in that stage
to be ABORTED. It never makes sense to allow the remaining tasks of a stage to
continue executing after the stage has been accepted as FAILED. However, the
remaining stages of the upgrade request may be allowed execute...
Skippable - mark a stage to allow it to be skipped in the event of a failure so
that the remaining stages may still execute. This means that when a stage state
is set to FAILED, it will not trigger the remaining stages of the request to
abort.
By separating the concepts of retry and skippable, we can be more flexible in
how we define the behavior of the upgrade. For example, the core masters
upgrade item should be marked as allow_retry = true and skippable = false. If a
failure occurs during this stage you should be able to retry. If the failure
can not be resolved then the entire upgrade request should be aborted.
Diffs
-----
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
ccecad9
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/HostRoleCommand.java
f71e2d5
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/Stage.java
4922fa5
ambari-server/src/main/java/org/apache/ambari/server/controller/AmbariActionExecutionHelper.java
17d5782
ambari-server/src/main/java/org/apache/ambari/server/controller/AmbariCustomCommandExecutionHelper.java
c8ae61d
ambari-server/src/main/java/org/apache/ambari/server/controller/AmbariManagementControllerImpl.java
19ee6d9
ambari-server/src/main/java/org/apache/ambari/server/controller/KerberosHelper.java
fb19bd5
ambari-server/src/main/java/org/apache/ambari/server/controller/internal/ClusterStackVersionResourceProvider.java
9329ea9
ambari-server/src/main/java/org/apache/ambari/server/controller/internal/HostStackVersionResourceProvider.java
3b1b462
ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java
fa39c97
ambari-server/src/main/java/org/apache/ambari/server/utils/StageUtils.java
e6e51a1
ambari-server/src/test/java/org/apache/ambari/server/actionmanager/ExecutionCommandWrapperTest.java
948f137
ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionDBAccessorImpl.java
a756275
ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionManager.java
01a40f4
ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionScheduler.java
8ce4ff2
ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestStage.java
bde19a1
ambari-server/src/test/java/org/apache/ambari/server/agent/TestHeartbeatHandler.java
a6df0db
ambari-server/src/test/java/org/apache/ambari/server/controller/AmbariManagementControllerTest.java
72a22e6
ambari-server/src/test/java/org/apache/ambari/server/serveraction/ServerActionExecutorTest.java
4bd0d18
ambari-server/src/test/java/org/apache/ambari/server/stageplanner/TestStagePlanner.java
dd2a519
Diff: https://reviews.apache.org/r/29298/diff/
Testing
-------
Results :
Tests run: 2447, Failures: 0, Errors: 0, Skipped: 13
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 27:49 min
[INFO] Finished at: 2014-12-21T23:52:28-05:00
[INFO] Final Memory: 42M/496M
[INFO] ------------------------------------------------------------------------
Thanks,
Tom Beerbower