[ 
https://issues.apache.org/jira/browse/HBASE-22810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912256#comment-16912256
 ] 

Zheng Hu commented on HBASE-22810:
----------------------------------

Thanks [~stack] for the fix.  Read the UT code,  it's indeed  a test which easy 
to be flaky.  For example, all snapshot request are submitted but the snapshot 
is a bit slow,  none are completed  when the assert begin: 
{code}
+    assertTrue("We expect at least 1 request to be rejected because of we 
concurrently" +
+        " issued many requests", takenSize < ssNum && takenSize > 0);
{code}
Then, the assert will be failure.  so +1 for me to remove it (I guess after 
increasing the 'hbase.master.executor.snapshot.threads',   it's easy to happen 
now).

[~an...@apache.org], Thanks for the reminding .  It's true, there are two 
different config keys for the snapshot threads size,  but I think they have 
different meanings:
1.  hbase.master.executor.snapshot.threads :   means how many snapshot requests 
from client we can handle at master side the same time; 
2. hbase.snapshot.master.threads:   how many snapshot procedure we can 
coordinator with region server. 
The config key#1 limit the all the snapshot request, while the key#2 only limit 
the snapshot procedure with RS ( it's a part of the snapshot request).    Maybe 
we can uniform the two config keys into one ?  although we will initialize two 
different thread pools with the same thread size for different purpose.



> Initialize an separate ThreadPoolExecutor for taking/restoring snapshot 
> ------------------------------------------------------------------------
>
>                 Key: HBASE-22810
>                 URL: https://issues.apache.org/jira/browse/HBASE-22810
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>             Fix For: 3.0.0, 1.5.0, 2.3.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11, 2.0.7
>
>
> In EventType class, we have the following definition, means  taking snapshot 
> & restoring snapshot are use the MASTER_TABLE_OPERATIONS  Executor now. 
> {code}
>   /**
>    * Messages originating from Client to Master.<br>
>    * C_M_SNAPSHOT_TABLE<br>
>    * Client asking Master to snapshot an offline table.
>    */
>   C_M_SNAPSHOT_TABLE        (48, ExecutorType.MASTER_TABLE_OPERATIONS),
>   /**
>    * Messages originating from Client to Master.<br>
>    * C_M_RESTORE_SNAPSHOT<br>
>    * Client asking Master to restore a snapshot.
>    */
>   C_M_RESTORE_SNAPSHOT      (49, ExecutorType.MASTER_TABLE_OPERATIONS),
> {code}
> But when I checked the MASTER_TABLE_OPERATIONS thread pool initialization, I 
> see : 
> {code}
>   private void startServiceThreads() throws IOException{
>    // ...  some other code initializing .... 
>    // We depend on there being only one instance of this executor running
>    // at a time.  To do concurrency, would need fencing of enable/disable of
>    // tables.
>    // Any time changing this maxThreads to > 1, pls see the comment at
>    // AccessController#postCompletedCreateTableAction
>    
> this.executorService.startExecutorService(ExecutorType.MASTER_TABLE_OPERATIONS,
>  1);
>    startProcedureExecutor();
> {code}
> That's to say,  for CPs  enable or disable table sequencely,  we will create 
> a ThreadPoolExecutor with threadPoolSize=1.   Then we actually cann't 
> accomplish the snapshoting  concurrence even if they are total difference 
> tables, says if there are two table snapshoting request, and the Table A cost 
>  5min for snapshoting, then the Table B need to wait 5min and once Table A 
> finish its snapshot , then Table B will start the snapshot.
> While we've setting the snapshot timeout, so it will be easy to timeout for 
> table B snapshoting .   Actually,  we can create a separate thead pool for 
> snapshot operations only.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to