[ 
https://issues.apache.org/jira/browse/HBASE-18525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115670#comment-16115670
 ] 

Umesh Agashe commented on HBASE-18525:
--------------------------------------

Hi [~tedyu],

I think with current code UnassignProcedure is expected to fail and in short 
term the test can be changed accordingly.

The test instantiates SocketTimeoutRsExecutor with maxServerRetries of 3. In 
case of AssignProcedure retries are determined by 
AssignmentManager.getAssignMaxAttempts() but in case of UnassignProcedure on 
first failure procedure is determined to be failed. Long term fix is to have 
max attempts for unassign operation like assign.

Pre HBASE-18491, we just assumed that when communication fails with RS, 
ServerCrashProcedure will take care of reassigning region and unaasign region 
operation can be considered successful. The assumption is a cause of the 
failure of other test re-enabled in HBASE-18491.

Let me know your thoughts.

Thanks, Umesh


> TestAssignmentManager#testSocketTimeout fails in master branch
> --------------------------------------------------------------
>
>                 Key: HBASE-18525
>                 URL: https://issues.apache.org/jira/browse/HBASE-18525
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: 18525.v1.txt
>
>
> Toward the end of the test output, I saw:
> {code}
> 2017-08-05 03:30:16,591 INFO  [Time-limited test] 
> assignment.TestAssignmentManager(446): ExecutionException
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.hbase.master.procedure.ServerCrashException: 
> ServerCrashProcedure pid=3, server=localhost,103,1
>   at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait$ProcedureFuture.get(ProcedureSyncWait.java:104)
>   at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait$ProcedureFuture.get(ProcedureSyncWait.java:62)
>   at 
> org.apache.hadoop.hbase.master.assignment.TestAssignmentManager.waitOnFuture(TestAssignmentManager.java:444)
>   at 
> org.apache.hadoop.hbase.master.assignment.TestAssignmentManager.testSocketTimeout(TestAssignmentManager.java:255)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hbase.master.procedure.ServerCrashException: 
> ServerCrashProcedure pid=3, server=localhost,103,1
>   at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:169)
>   at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:274)
>   at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:57)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:847)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1440)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1209)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:79)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1719)
> {code}
> This test failure seems to happen after HBASE-18491 was checked in.
> Looking at the change in UnassignProcedure, it seems we should handle the two 
> conditions differently:
> {code}
>      if (serverCrashed.get() || !isServerOnline(env, regionNode)) {
> {code}
> With attached patch, TestAssignmentManager#testSocketTimeout and 
> TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta pass.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to