[ 
https://issues.apache.org/jira/browse/HBASE-4742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144863#comment-13144863
 ] 

Phabricator commented on HBASE-4742:
------------------------------------

Liyin has commented on the revision "[jira] [HBASE-4742] Split dead server's 
log in parallel".

  Thank Mikhail for your quick response.
  We have agreed on most of the discussion here.

  The remaining discussion is focusing on the number of threads launched in 
master for splitting dead servers log, which has made me re-considering our 
motivation about parallel distributed log splitting here.

  Our basic motivation is splitting log should not block the region server 
process queue. Also the distributed log splitting itself is designed to split 
log for a large number region servers. So we could batch all the dead region 
servers together into a queue and launch single thread to do the distributed 
log splitting,  instead of distribute log splitting for each dead server as a 
separate thread.



INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java:333 
Thank you for clarify your concern:)

  For each dead server, the master would receive the znode expire event for 
this dead server only once.
  So the master wouldn't have 2 threads split the same dead region server at 
same time.
  src/main/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java:337 
Fine. I would change it to "Succeeded in splitting".
  
src/main/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java:347-348 
Thanks for clarifying.
  
src/test/java/org/apache/hadoop/hbase/master/TestMultiRegionServerShutDown.java:135
 Actually, I don't have to catch the exceptions here explicitly.
  It won't affect the unit test results.
  Thanks for the discussion.
  src/main/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java:312 
1) We use distributed log splitting for the dead region server as well.

  2) Even though we use thread pool to execute, I would bound the max thread as 
the number of region server.
  What do you think of the max thread we should bound for the execute thread 
pool here?

  Also as your example mentioned here, 500 region server went down. The master 
would launch 500 threads to distributed log splitting in parallel. It won't 
choke the master too much since the split job is done on each region server 
side.

  3) But this discussion also leads us to another good point. Let's say if 
there are a large number region server dead for some reason. Shall we batch 
these dead region servers to split instead of splitting their log in parallel.

  Any ideas? Mikhail and Prakash ?



REVISION DETAIL
  https://reviews.facebook.net/D237

                
> Split dead server's log in parallel
> -----------------------------------
>
>                 Key: HBASE-4742
>                 URL: https://issues.apache.org/jira/browse/HBASE-4742
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D237.1.patch, D237.2.patch, D237.3.patch, D237.4.patch
>
>
> When one region server goes down, the master will shutdown the region server 
> and split its log.
> However, splitting log is a blocking call and it would take some time.
> If more than one region server go down, the master will split its log one by 
> one, which is not efficient.
> Since we have the distributed log split, we could split these logs from the 
> dead servers in parallel. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to