[ 
https://issues.apache.org/jira/browse/HBASE-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628481#comment-13628481
 ] 

Jonathan Hsieh commented on HBASE-7636:
---------------------------------------

I bumped the some timeouts higher and improved failure messages in an 
experiment.

The test basically creates a table on 6 rs's (thread), kills 3 rs's, and then 
makes sure that all regions get back up.

When running this, we get to a state where a region cannot be opened after a RS 
is aborted.
{code}
    <failure message="Took too long to get all the regions back online. Have 36 
but want at least 41" type="java.lang.AssertionError">java.lang.AssertionError: 
Took too long to get all the regions back online. Have 36 but want at least 41
        at org.junit.Assert.fail(Assert.java:88)
        at 
org.apache.hadoop.hbase.master.TestDistributedLogSplitting.testThreeRSAbort(TestDistributedLogSplitting.java:276)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
</failure>
{code}

The root is some sort of hadoop access control problem (with different combos 
fo IPC Server handler # and user jon.hfs.*)

{code}
2013-04-10 14:46:17,426 ERROR [IPC Server handler 7 on 46469] 
security.UserGroupInformation(1370): PriviledgedActionException as:jon.hfs.1 
(auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: 
Can&apos;t continue with getBlockLocalPathInfo() authorization. The user 
jon.hfs.1 is not allowed to call getBlockLocalPathInfo
{code}

when trying to open the region (different region server codes, but all with the 
same region failing):
{code}
2013-04-10 14:44:16,384 ERROR [RS_OPEN_REGION-localhost,50866,1365630214743-0] 
handler.OpenRegionHandler(464): Failed open of region=table,,136563021
7156.7c1b2def33bf6627d581ecf6fbb389f3., starting to roll back the global 
memstore size.
{code}

                
> TestDistributedLogSplitting#testThreeRSAbort fails against hadoop 2.0
> ---------------------------------------------------------------------
>
>                 Key: HBASE-7636
>                 URL: https://issues.apache.org/jira/browse/HBASE-7636
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Ted Yu
>            Assignee: Jonathan Hsieh
>
> From 
> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/364/testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testThreeRSAbort/
>  :
> {code}
> 2013-01-21 11:49:34,276 DEBUG 
> [MASTER_SERVER_OPERATIONS-juno.apache.org,57966,1358768818594-0] 
> client.HConnectionManager$HConnectionImplementation(956): Looked up root 
> region location, connection=hconnection 0x12f19fe; 
> serverName=juno.apache.org,55531,1358768819479
> 2013-01-21 11:49:34,278 INFO  
> [MASTER_SERVER_OPERATIONS-juno.apache.org,57966,1358768818594-0] 
> catalog.CatalogTracker(576): Failed verification of .META.,,1 at 
> address=juno.apache.org,57582,1358768819456; 
> org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is 
> in the failed servers list: juno.apache.org/67.195.138.61:57582
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to