[
https://issues.apache.org/jira/browse/HBASE-13391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395898#comment-14395898
]
Andrew Purtell edited comment on HBASE-13391 at 4/4/15 7:39 PM:
----------------------------------------------------------------
I was finally able to reproduce a failure once by introducing some external IO
and CPU activity. Attached are logs of
TestRegionObserverInterface#testLegacyRecovery for a passing case and a failing
case.
One thing I see is when on line 683 of TestRegionObserverInterface we say "All
regions assigned", in the failing case there is still SplitWorker activity
ongoing. Replay ops haven't finished yet when we check for WAL related CP
method invocations? I thought I'd see if disabling distributed replay would
change the behavior of the test:
{code}
diff --git a/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/Test
RegionObserverInterface.java
b/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
index 5bd8b19..ba028dc 100644
---
a/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
+++
b/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
@@ -39,6 +39,7 @@ import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.Coprocessor;
import org.apache.hadoop.hbase.HBaseTestingUtility;
import org.apache.hadoop.hbase.HColumnDescriptor;
+import org.apache.hadoop.hbase.HConstants;
import org.apache.hadoop.hbase.HRegionInfo;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.KeyValue;
@@ -101,6 +102,7 @@ public class TestRegionObserverInterface {
conf.setStrings(CoprocessorHost.REGION_COPROCESSOR_CONF_KEY,
"org.apache.hadoop.hbase.coprocessor.SimpleRegionObserver",
"org.apache.hadoop.hbase.coprocessor.SimpleRegionObserver$Legacy");
+ conf.setBoolean(HConstants.DISTRIBUTED_LOG_REPLAY_KEY, false);
util.startMiniCluster();
cluster = util.getMiniHBaseCluster();
{code}
but that causes a different sort of failure:
{noformat}
java.lang.AssertionError: Result of
org.apache.hadoop.hbase.coprocessor.SimpleRegionObserver$Legacy.getCtPreWALRestore
is expected to be 1, while we get 3
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at
org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.verifyMethodResult(TestRegionObserverInterface.java:753)
at
org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.testLegacyRecovery(TestRegionObserverInterface.java:687)
{noformat}
Any thoughts on what might be going on here [~busbey]?
was (Author: apurtell):
I was finally able to reproduce a failure once by introducing some external IO
and CPU activity. Attached are logs of
TestRegionObserverInterface#testLegacyRecovery for a passing case and a failing
case.
One thing I see is when on line 683 of TestRegionObserverInterface we say "All
regions assigned", in the failing case there is still SplitWorker activity
ongoing. Replay ops haven't finished yet when we check for WAL related CP
method invocations? I thought I'd see if disabling distributed replay would
change the behavior of the test but that causes a different sort of failure:
{noformat}
java.lang.AssertionError: Result of
org.apache.hadoop.hbase.coprocessor.SimpleRegionObserver$Legacy.getCtPreWALRestore
is expected to be 1, while we get 3
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at
org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.verifyMethodResult(TestRegionObserverInterface.java:753)
at
org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.testLegacyRecovery(TestRegionObserverInterface.java:687)
{noformat}
Any thoughts on what might be going on here [~busbey]?
> TestRegionObserverInterface frequently failing on branch-1
> -----------------------------------------------------------
>
> Key: HBASE-13391
> URL: https://issues.apache.org/jira/browse/HBASE-13391
> Project: HBase
> Issue Type: Bug
> Reporter: Andrew Purtell
> Assignee: Andrew Purtell
> Fix For: 2.0.0, 1.1.0
>
> Attachments: test.log.fail.txt, test.log.pass.txt
>
>
> TestRegionObserverInterface is frequently failing on branch-1 .
> Example:
> {noformat}
> java.lang.AssertionError: Result of
> org.apache.hadoop.hbase.coprocessor.SimpleRegionObserver$Legacy.getCtPreWALRestore
> is expected to be 1, while we get 0
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at
> org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.verifyMethodResult(TestRegionObserverInterface.java:751)
> at
> org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.testLegacyRecovery(TestRegionObserverInterface.java:685)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)