[ 
https://issues.apache.org/jira/browse/HBASE-24545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-24545:
----------------------------------
    Release Note: Adds backoff in ServerCrashProcedure wait on WAL split to 
complete if large backlog of files to split (Its possible to avoid SCP 
blocking, waiting on WALs to split if you use procedure-based splitting --  set 
'hbase.split.wal.zk.coordinated' to false to enable procedure based wal 
splitting.)  (was: Adds backoff in ServerCrashProcedure wait on WAL split to 
complete if large backlog of files to split (Its possible to avoid SCP 
blocking, waiting on WALs to split if you use procedure-based splitting --  set 
'hbase.split.wal.zk.coordinated' to false to enable procedure based wal 
splitting.)

> Add backoff to SCP check on WAL split completion
> ------------------------------------------------
>
>                 Key: HBASE-24545
>                 URL: https://issues.apache.org/jira/browse/HBASE-24545
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Michael Stack
>            Assignee: Michael Stack
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.3.0
>
>
> Crashed cluster. Lots of backed up WALs. Startup. Recover hundreds of 
> servers; each has a running SCP. Taking a thread dump during recovery, I 
> noticed that there were 160 threads each in SCP waiting on split WAL 
> completion. Each thread was scanning zk splitWAL directory every 100ms. The 
> dir had thousands of entries in it so each check was pulling down MB from 
> zk... * 160 (max configured PE threads (16) * 10 for the KeepAlive factor 
> that has us do 10 * configured PEs as max for PE worker pool).
> If lots of remaining WALs to split, have the SCP backoff on its wait so it 
> checks less frequently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to