[ 
https://issues.apache.org/jira/browse/HBASE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091568#comment-16091568
 ] 

Stephen Yuan Jiang commented on HBASE-16488:
--------------------------------------------

V10 patch in branch-1 is approved by [~enis].  

Most tests are passed in pre-commit.  In failed UT, I checked the source code 
and don't think they are related to this change.  I re-run those tests locally, 
and all except one passed.  

The only test that fails consistently in my local machine is 
{{org.apache.hadoop.hbase.regionserver.TestRSKilledWhenInitializing.testRSTerminationAfterRegisteringToMasterBeforeCreatingEphemeralNode}}
 - I spent some time to debug it and don't think this is related to this 
change.  The test kills one RS and assert that server manager thinks this RS is 
not online.   Without any change, the test passed in my local machine 
consistently.  I added some logging in the test (just some LOG.info statements 
inside the test, no other changes) and see what is going on, it would fail 
consistently that server manager thinks RS is still online.  If I add some 
waiting before assert, the test would pass with about 600ms wait in my local 
machine.  This is with only log info messages in test and no real change.  
Seems there is a delay between "mini cluster get live server thinks the RS is 
dead" and "master server manager remove the RS from the online server list".  
With the patch, the same is true, with about 600ms delay (has nothing to do 
with namespace), the test passed.  I think this is test issue and if it 
consistently repro in pre-commit.  I will fix the test in a separate JIRA.

> Starting namespace and quota services in master startup asynchronizely
> ----------------------------------------------------------------------
>
>                 Key: HBASE-16488
>                 URL: https://issues.apache.org/jira/browse/HBASE-16488
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 2.0.0, 1.3.0, 1.0.3, 1.4.0, 1.1.5, 1.2.2
>            Reporter: Stephen Yuan Jiang
>            Assignee: Stephen Yuan Jiang
>         Attachments: HBASE-16488.v10-branch-1.patch, 
> HBASE-16488.v1-branch-1.patch, HBASE-16488.v1-master.patch, 
> HBASE-16488.v2-branch-1.patch, HBASE-16488.v2-branch-1.patch, 
> HBASE-16488.v3-branch-1.patch, HBASE-16488.v3-branch-1.patch, 
> HBASE-16488.v4-branch-1.patch, HBASE-16488.v5-branch-1.patch, 
> HBASE-16488.v6-branch-1.patch, HBASE-16488.v7-branch-1.patch, 
> HBASE-16488.v8-branch-1.patch, HBASE-16488.v9-branch-1.patch
>
>
> From time to time, during internal IT test and from customer, we often see 
> master initialization failed due to namespace table region takes long time to 
> assign (eg. sometimes split log takes long time or hanging; or sometimes RS 
> is temporarily not available; sometimes due to some unknown assignment 
> issue).  In the past, there was some proposal to improve this situation, eg. 
> HBASE-13556 / HBASE-14190 (Assign system tables ahead of user region 
> assignment) or HBASE-13557 (Special WAL handling for system tables) or  
> HBASE-14623 (Implement dedicated WAL for system tables).  
> This JIRA proposes another way to solve this master initialization fail 
> issue: namespace service is only used by a handful operations (eg. create 
> table / namespace DDL / get namespace API / some RS group DDL).  Only quota 
> manager depends on it and quota management is off by default.  Therefore, 
> namespace service is not really needed for master to be functional.  So we 
> could start namespace service asynchronizely without blocking master startup.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to