[ 
https://issues.apache.org/jira/browse/SLIDER-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085066#comment-14085066
 ] 

Ted Yu commented on SLIDER-276:
-------------------------------

Here is proposed change:
{code}
diff --git 
a/slider-core/src/main/java/org/apache/slider/server/appmaster/state/NodeEntry.java
 b/slider-core/src/main/java/org/apache/slider/server/appmaster/state/NodeEnt
index a9e5a8c..c8ab2a7 100644
--- 
a/slider-core/src/main/java/org/apache/slider/server/appmaster/state/NodeEntry.java
+++ 
b/slider-core/src/main/java/org/apache/slider/server/appmaster/state/NodeEntry.java
@@ -169,7 +169,6 @@ public class NodeEntry {
    * Release an instance -which is no longer marked as active
    */
   public synchronized void release() {
-    assert live > 0 : "no live nodes to release";
     releasing++;
   }
{code}

> Inaccurate assertion in NodeEntry#release()
> -------------------------------------------
>
>                 Key: SLIDER-276
>                 URL: https://issues.apache.org/jira/browse/SLIDER-276
>             Project: Slider
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Minor
>
> I issued flex command to reduce the number of region servers by 1:
> {code}
> 14/08/04 18:14:52 INFO state.AppState: RoleStatus{name='HBASE_REGIONSERVER', 
> key=2, desired=1, actual=2, requested=0, releasing=0, failed=0, started=2, 
> startFailed=0, completed=0, failureMessage=''}
> 14/08/04 18:14:52 INFO state.AppState: HBASE_REGIONSERVER: Asking for 1 fewer 
> node(s) for a total of 1
> 14/08/04 18:14:52 INFO state.AppState: RoleStatus{name='HBASE_MASTER', key=1, 
> desired=1, actual=1, requested=0, releasing=0, failed=0, started=1, 
> startFailed=0, completed=0, failureMessage=''}
> 14/08/04 18:14:52 INFO state.AppState: RoleStatus{name='HBASE_REST', key=3, 
> desired=1, actual=1, requested=0, releasing=0, failed=0, started=1, 
> startFailed=0, completed=0, failureMessage=''}
> 14/08/04 18:14:52 INFO appmaster.SliderAppMaster: onContainersCompleted([1]
> 14/08/04 18:14:52 INFO appmaster.SliderAppMaster: Container Completion for 
> containerID=container_1405721039692_0013_01_000004, state=COMPLETE, 
> exitStatus=-100, diagnostics=Container released by application
> 14/08/04 18:14:52 INFO state.AppState: Container was queued for release
> 14/08/04 18:14:52 INFO state.AppState: decrementing role count for role 
> HBASE_REGIONSERVER
> 14/08/04 18:14:53 INFO state.AppState: RoleStatus{name='HBASE_REGIONSERVER', 
> key=2, desired=1, actual=1, requested=0, releasing=0, failed=0, started=2, 
> startFailed=0, completed=1, failureMessage=''}
> 14/08/04 18:14:53 INFO state.AppState: RoleStatus{name='HBASE_MASTER', key=1, 
> desired=1, actual=1, requested=0, releasing=0, failed=0, started=1, 
> startFailed=0, completed=0, failureMessage=''}
> 14/08/04 18:14:53 INFO state.AppState: RoleStatus{name='HBASE_REST', key=3, 
> desired=1, actual=1, requested=0, releasing=0, failed=0, started=1, 
> startFailed=0, completed=0, failureMessage=''}
> 14/08/04 18:16:18 WARN agent.HeartbeatMonitor: Component 
> container_1405721039692_0013_01_000004___HBASE_REGIONSERVER marked UNHEALTHY. 
> Last heartbeat received at 1407176092207 approx. 86129 ms. back.
> 14/08/04 18:17:18 WARN agent.HeartbeatMonitor: Component 
> container_1405721039692_0013_01_000004___HBASE_REGIONSERVER marked 
> HEARTBEAT_LOST. Last heartbeat received at 1407176092207 approx. 146130 ms. 
> back.
> 14/08/04 18:17:18 INFO appmaster.SliderAppMaster: Refreshing container 
> container_1405721039692_0013_01_000004 per provider request.
> 14/08/04 18:17:18 WARN agent.HeartbeatMonitor: ERROR
> java.lang.AssertionError: no live nodes to release
>       at 
> org.apache.slider.server.appmaster.state.NodeEntry.release(NodeEntry.java:172)
>       at 
> org.apache.slider.server.appmaster.state.RoleHistory.onContainerReleaseSubmitted(RoleHistory.java:656)
>       at 
> org.apache.slider.server.appmaster.state.AppState.containerReleaseSubmitted(AppState.java:919)
>       at 
> org.apache.slider.server.appmaster.state.AppState.releaseContainer(AppState.java:1491)
>       at 
> org.apache.slider.server.appmaster.SliderAppMaster.refreshContainer(SliderAppMaster.java:1444)
>       at 
> org.apache.slider.providers.agent.AgentProviderService.releaseContainer(AgentProviderService.java:391)
>       at 
> org.apache.slider.providers.agent.HeartbeatMonitor.doWork(HeartbeatMonitor.java:109)
>       at 
> org.apache.slider.providers.agent.HeartbeatMonitor.run(HeartbeatMonitor.java:69)
>       at java.lang.Thread.run(Thread.java:722)
> {code}
> As can be seen above, NodeEntry#containerCompleted() event was received 
> before NodeEntry#release() was called.
> This triggered the following assertion:
> {code}
>   public synchronized void release() {
>     assert live > 0 : "no live nodes to release";
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to