[
https://issues.apache.org/jira/browse/HBASE-4742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144863#comment-13144863
]
Phabricator commented on HBASE-4742:
------------------------------------
Liyin has commented on the revision "[jira] [HBASE-4742] Split dead server's
log in parallel".
Thank Mikhail for your quick response.
We have agreed on most of the discussion here.
The remaining discussion is focusing on the number of threads launched in
master for splitting dead servers log, which has made me re-considering our
motivation about parallel distributed log splitting here.
Our basic motivation is splitting log should not block the region server
process queue. Also the distributed log splitting itself is designed to split
log for a large number region servers. So we could batch all the dead region
servers together into a queue and launch single thread to do the distributed
log splitting, instead of distribute log splitting for each dead server as a
separate thread.
INLINE COMMENTS
src/main/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java:333
Thank you for clarify your concern:)
For each dead server, the master would receive the znode expire event for
this dead server only once.
So the master wouldn't have 2 threads split the same dead region server at
same time.
src/main/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java:337
Fine. I would change it to "Succeeded in splitting".
src/main/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java:347-348
Thanks for clarifying.
src/test/java/org/apache/hadoop/hbase/master/TestMultiRegionServerShutDown.java:135
Actually, I don't have to catch the exceptions here explicitly.
It won't affect the unit test results.
Thanks for the discussion.
src/main/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java:312
1) We use distributed log splitting for the dead region server as well.
2) Even though we use thread pool to execute, I would bound the max thread as
the number of region server.
What do you think of the max thread we should bound for the execute thread
pool here?
Also as your example mentioned here, 500 region server went down. The master
would launch 500 threads to distributed log splitting in parallel. It won't
choke the master too much since the split job is done on each region server
side.
3) But this discussion also leads us to another good point. Let's say if
there are a large number region server dead for some reason. Shall we batch
these dead region servers to split instead of splitting their log in parallel.
Any ideas? Mikhail and Prakash ?
REVISION DETAIL
https://reviews.facebook.net/D237
> Split dead server's log in parallel
> -----------------------------------
>
> Key: HBASE-4742
> URL: https://issues.apache.org/jira/browse/HBASE-4742
> Project: HBase
> Issue Type: Improvement
> Reporter: Liyin Tang
> Assignee: Liyin Tang
> Attachments: D237.1.patch, D237.2.patch, D237.3.patch, D237.4.patch
>
>
> When one region server goes down, the master will shutdown the region server
> and split its log.
> However, splitting log is a blocking call and it would take some time.
> If more than one region server go down, the master will split its log one by
> one, which is not efficient.
> Since we have the distributed log split, we could split these logs from the
> dead servers in parallel.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira