[
https://issues.apache.org/jira/browse/HBASE-22810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912256#comment-16912256
]
Zheng Hu commented on HBASE-22810:
----------------------------------
Thanks [~stack] for the fix. Read the UT code, it's indeed a test which easy
to be flaky. For example, all snapshot request are submitted but the snapshot
is a bit slow, none are completed when the assert begin:
{code}
+ assertTrue("We expect at least 1 request to be rejected because of we
concurrently" +
+ " issued many requests", takenSize < ssNum && takenSize > 0);
{code}
Then, the assert will be failure. so +1 for me to remove it (I guess after
increasing the 'hbase.master.executor.snapshot.threads', it's easy to happen
now).
[[email protected]], Thanks for the reminding . It's true, there are two
different config keys for the snapshot threads size, but I think they have
different meanings:
1. hbase.master.executor.snapshot.threads : means how many snapshot requests
from client we can handle at master side the same time;
2. hbase.snapshot.master.threads: how many snapshot procedure we can
coordinator with region server.
The config key#1 limit the all the snapshot request, while the key#2 only limit
the snapshot procedure with RS ( it's a part of the snapshot request). Maybe
we can uniform the two config keys into one ? although we will initialize two
different thread pools with the same thread size for different purpose.
> Initialize an separate ThreadPoolExecutor for taking/restoring snapshot
> ------------------------------------------------------------------------
>
> Key: HBASE-22810
> URL: https://issues.apache.org/jira/browse/HBASE-22810
> Project: HBase
> Issue Type: Improvement
> Reporter: Zheng Hu
> Assignee: Zheng Hu
> Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.3.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11, 2.0.7
>
>
> In EventType class, we have the following definition, means taking snapshot
> & restoring snapshot are use the MASTER_TABLE_OPERATIONS Executor now.
> {code}
> /**
> * Messages originating from Client to Master.<br>
> * C_M_SNAPSHOT_TABLE<br>
> * Client asking Master to snapshot an offline table.
> */
> C_M_SNAPSHOT_TABLE (48, ExecutorType.MASTER_TABLE_OPERATIONS),
> /**
> * Messages originating from Client to Master.<br>
> * C_M_RESTORE_SNAPSHOT<br>
> * Client asking Master to restore a snapshot.
> */
> C_M_RESTORE_SNAPSHOT (49, ExecutorType.MASTER_TABLE_OPERATIONS),
> {code}
> But when I checked the MASTER_TABLE_OPERATIONS thread pool initialization, I
> see :
> {code}
> private void startServiceThreads() throws IOException{
> // ... some other code initializing ....
> // We depend on there being only one instance of this executor running
> // at a time. To do concurrency, would need fencing of enable/disable of
> // tables.
> // Any time changing this maxThreads to > 1, pls see the comment at
> // AccessController#postCompletedCreateTableAction
>
> this.executorService.startExecutorService(ExecutorType.MASTER_TABLE_OPERATIONS,
> 1);
> startProcedureExecutor();
> {code}
> That's to say, for CPs enable or disable table sequencely, we will create
> a ThreadPoolExecutor with threadPoolSize=1. Then we actually cann't
> accomplish the snapshoting concurrence even if they are total difference
> tables, says if there are two table snapshoting request, and the Table A cost
> 5min for snapshoting, then the Table B need to wait 5min and once Table A
> finish its snapshot , then Table B will start the snapshot.
> While we've setting the snapshot timeout, so it will be easy to timeout for
> table B snapshoting . Actually, we can create a separate thead pool for
> snapshot operations only.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)