[jira] [Commented] (HBASE-19554) AbstractTestDLS.testThreeRSAbort sometimes fails in pre commit

Duo Zhang (JIRA) Tue, 20 Feb 2018 00:17:45 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-19554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369800#comment-16369800
 ]


Duo Zhang commented on HBASE-19554:
-----------------------------------

I think this is a problem which we have already addressed? We have already 
closed the ConnectionImplementation from outside, and then the initialize of 
HMaster should quit immediately. But it seems not, it keeps retrying on the 
closed connection... Let me check what's wrong here...

{noformat}
2018-02-20 02:31:12,578 DEBUG [M:0;asf911:38379] 
client.RpcRetryingCallerImpl(132): Call exception, tries=11, retries=11, 
started=48515 ms ago, cancelled=false, msg=hconnection-0x195e9495 closed, 
details=row 'default' on table 'hbase:namespace' at 
region=hbase:namespace,,1519093804365.3ff104b43ef4ce0a491e1f26b598813e., 
hostname=asf911.gq1.ygridcore.net,51776,1519093799312, seqNum=2, 
exception=java.io.IOException: hconnection-0x195e9495 closed
        at 
org.apache.hadoop.hbase.client.ConnectionImplementation.getTableState(ConnectionImplementation.java:1959)
        at 
org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getTableState(ConnectionUtils.java:131)
        at 
org.apache.hadoop.hbase.client.ConnectionImplementation.isTableDisabled(ConnectionImplementation.java:573)
        at 
org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.isTableDisabled(ConnectionUtils.java:131)
        at 
org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:219)
        at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:388)
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:362)
        at 
org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:141)
        at 
org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:278)
        at 
org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:103)
        at 
org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62)
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226)
        at 
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1053)
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:919)
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2017)
        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:559)
        at java.lang.Thread.run(Thread.java:748)
{noformat}


> AbstractTestDLS.testThreeRSAbort sometimes fails in pre commit
> --------------------------------------------------------------
>
>                 Key: HBASE-19554
>                 URL: https://issues.apache.org/jira/browse/HBASE-19554
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Recovery, wal
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Major
>             Fix For: 2.0.0-beta-2
>
>         Attachments: HBASE-19554-thread-dump.patch, HBASE-19554.patch
>
>
> https://builds.apache.org/job/PreCommit-HBASE-Build/10554/artifact/patchprocess/patch-unit-hbase-server.txt
> The error message is a bit strange:
> {quote}
> [ERROR] testThreeRSAbort(org.apache.hadoop.hbase.master.TestDLSAsyncFSWAL) 
> Time elapsed: 20.627 s <<< ERROR!
> org.apache.hadoop.hbase.TableNotFoundException: Region of 
> 'hbase:namespace,,1513320505933.451650152885a3b41d0b1110deca513c.' is 
> expected in the table of 'testThreeRSAbort', but hbase:meta says it is in the 
> table of 'hbase:namespace'. hbase:meta might be damaged.
> {quote}
> It fails for both FSHLog and AsyncFSWAL. Need to dig more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-19554) AbstractTestDLS.testThreeRSAbort sometimes fails in pre commit

Reply via email to