[
https://issues.apache.org/jira/browse/HBASE-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083428#comment-16083428
]
Umesh Agashe commented on HBASE-18366:
--------------------------------------
Thanks [~stack], [~yangzhe1991]!
I think its timing issue as I have seen it passing too! But for me its failing
much more number of times than passing. I am still debugging it. From what I
see:
TableNotFoundException is for table
'testRecoveryAndDoubleExecution-carryingMeta-true'. This table is created by
the test and exception is thrown in util.countRows() when table is scanned, in
following code snippet:
{code}
// Now run through the procedure twice crashing the executor on each
step...
MasterProcedureTestingUtility.testRecoveryAndDoubleExecution(procExec,
procId);
// Assert all data came back.
assertEquals(count, util.countRows(t));
{code}
Here is the exception:
{code}
org.apache.hadoop.hbase.TableNotFoundException:
testRecoveryAndDoubleExecution-carryingMeta-true
at
org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:845)
at
org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:745)
at
org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:720)
at
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:316)
at
org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:139)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:399)
at
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:104)
at
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
At this time I am not quite sure about how changes for HBASE-17931 are
affecting the test but after reverting the changes locally I ran test 4-5 times
and it passed all the time. If meta region is being transitioned while scan is
going on, we can see this exception but I will have to confirm thats the case
here.
AssignmentManager.checkIfShouldMoveSystemRegionAsync() is being called during
active master initialization and from RegionServerTracker.refresh() and
moveAsync() is used to submit the procedure. This can explain timing issue. If
I can not get to bottom of this by tomorrow, I will disable the test and
continue working on it.
> Fix flaky test
> hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-18366
> URL: https://issues.apache.org/jira/browse/HBASE-18366
> Project: HBase
> Issue Type: Bug
> Reporter: Umesh Agashe
> Assignee: Umesh Agashe
>
> It worked for a few days after enabling it with HBASE-18278. But started
> failing after commits:
> 6786b2b
> 68436c9
> 75d2eca
> 50bb045
> df93c13
> It works with one commit before: c5abb6c. Need to see what changed with those
> commits.
> Currently it fails with TableNotFoundException.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)