[ 
https://issues.apache.org/jira/browse/HBASE-25229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-25229.
-------------------------------
      Assignee:     (was: Jeongdae Kim)
    Resolution: Won't Fix

All 1.x release lines are EOL.

Feel fee to reopen if this also affects 2.x and master.

> Instantiate BucketCache before RS creates a their ephemeral node when 
> rolling-upgrade
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-25229
>                 URL: https://issues.apache.org/jira/browse/HBASE-25229
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 1.5.0, 1.6.0, 1.7.0, 1.4.13
>            Reporter: Jeongdae Kim
>            Priority: Minor
>
> We observed many clients couldn't get information on region locations for 
> tens of seconds during rolling-upgrade from 1.2.x to 1.4.x, and all requests 
> to regions moved by graceful restart failed.
>  
> The reason is that 
> # Since HBASE-17931, system tables are assigned to RS with highest version
> # Since HBASE-12034, bucket cache initialization process has moved from RS 
> instantiation to RS initialization process after reporting to master, 
> moreover an ephemeral node for RS is created before bucket cache creation.
> # when using offheap bucketcache, it takes too much time to allocate memory 
> for it (18 seconds for 31GB in our case) 
> [https://github.com/apache/hbase/blob/branch-1.4/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferArray.java#L52-L72]
> # Once ephemeral nodes created, a master try to move system regions to RS 
> with highest version when first RS restart of whole rolling-restart process. 
> but, by 3) the RS is not ready for serving system regions yet. moving system 
> regions keep failing until 3) is finished.
>  
> I think this could happen only in branch-1, because an ephemeral node is 
> created after creating block caches in hbase 2.x. there is no need to create 
> block caches after ephemeral node creation at all.
>  
> I verified this issue could be resolved by just changing their creation order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to