[
https://issues.apache.org/jira/browse/HBASE-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628481#comment-13628481
]
Jonathan Hsieh commented on HBASE-7636:
---------------------------------------
I bumped the some timeouts higher and improved failure messages in an
experiment.
The test basically creates a table on 6 rs's (thread), kills 3 rs's, and then
makes sure that all regions get back up.
When running this, we get to a state where a region cannot be opened after a RS
is aborted.
{code}
<failure message="Took too long to get all the regions back online. Have 36
but want at least 41" type="java.lang.AssertionError">java.lang.AssertionError:
Took too long to get all the regions back online. Have 36 but want at least 41
at org.junit.Assert.fail(Assert.java:88)
at
org.apache.hadoop.hbase.master.TestDistributedLogSplitting.testThreeRSAbort(TestDistributedLogSplitting.java:276)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
</failure>
{code}
The root is some sort of hadoop access control problem (with different combos
fo IPC Server handler # and user jon.hfs.*)
{code}
2013-04-10 14:46:17,426 ERROR [IPC Server handler 7 on 46469]
security.UserGroupInformation(1370): PriviledgedActionException as:jon.hfs.1
(auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException:
Can't continue with getBlockLocalPathInfo() authorization. The user
jon.hfs.1 is not allowed to call getBlockLocalPathInfo
{code}
when trying to open the region (different region server codes, but all with the
same region failing):
{code}
2013-04-10 14:44:16,384 ERROR [RS_OPEN_REGION-localhost,50866,1365630214743-0]
handler.OpenRegionHandler(464): Failed open of region=table,,136563021
7156.7c1b2def33bf6627d581ecf6fbb389f3., starting to roll back the global
memstore size.
{code}
> TestDistributedLogSplitting#testThreeRSAbort fails against hadoop 2.0
> ---------------------------------------------------------------------
>
> Key: HBASE-7636
> URL: https://issues.apache.org/jira/browse/HBASE-7636
> Project: HBase
> Issue Type: Sub-task
> Reporter: Ted Yu
> Assignee: Jonathan Hsieh
>
> From
> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/364/testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testThreeRSAbort/
> :
> {code}
> 2013-01-21 11:49:34,276 DEBUG
> [MASTER_SERVER_OPERATIONS-juno.apache.org,57966,1358768818594-0]
> client.HConnectionManager$HConnectionImplementation(956): Looked up root
> region location, connection=hconnection 0x12f19fe;
> serverName=juno.apache.org,55531,1358768819479
> 2013-01-21 11:49:34,278 INFO
> [MASTER_SERVER_OPERATIONS-juno.apache.org,57966,1358768818594-0]
> catalog.CatalogTracker(576): Failed verification of .META.,,1 at
> address=juno.apache.org,57582,1358768819456;
> org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is
> in the failed servers list: juno.apache.org/67.195.138.61:57582
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira