[
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jingyun Tian reassigned HBASE-19358:
------------------------------------
Assignee: Jingyun Tian
> Improve the stability of splitting log when do fail over
> --------------------------------------------------------
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
> Issue Type: Improvement
> Components: MTTR
> Affects Versions: 0.98.24
> Reporter: Jingyun Tian
> Assignee: Jingyun Tian
> Attachments: newLogic.jpg, previousLogic.jpg
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12899558/previousLogic.jpg!
> The problem is the OutputSink will write the recovered edits during splitting
> log, which means it will create one WriterAndPath for each region. If the
> cluster is small and the number of regions per rs is large, it will create
> too many HDFS streams at the same time. Then it is prone to failure since
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.
> !https://issues.apache.org/jira/secure/attachment/12899557/newLogic.jpg!
> We cached the recovered edits unless exceeds the memory limits we set or
> reach the end, then we have a thread pool to do the rest things: write them
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during
> splitting log,
> it will not exceeds *_hbase.regionserver.wal.max.splitters *
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog
> contains_*.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)