[
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135501#comment-15135501
]
Duo Zhang commented on HBASE-14790:
-----------------------------------
[~stack] Oh there is a WALPE tool, I didn't know it before, I have run a
randomWrite test in the PerformanceEvaluation tool...
The code quality is not good enough for merging it now. And there are two
problems before I start working on preparing a patch
1. Where should we place the FanOut stream. I use lots of reflection and some
methods only visible to tests in HDFS to implement the new stream. Since we
could get a better performance, is it enough to make HDFS guys accept it as
part of the HDFS project?
2. I do not introduce a new WALProvider. Since it still writes data on HDFS, I
just introduce an AsyncFSHLog which shares a base class(AbstractFSHLog in the
HBASE-14790 branch) of FSHLog and add a flag to tell DefaultWALProvider it
should use FSHLog or AsyncFSHLog. And also, I introduce a new AsyncWriter
interface. The append method of AsyncWriter only buffers data in memory. What
do you think [~stack] and [~busbey]? Do you guys have other ideas of how to
integrate the async logic in WAL?
Thanks.
> Implement a new DFSOutputStream for logging WAL only
> ----------------------------------------------------
>
> Key: HBASE-14790
> URL: https://issues.apache.org/jira/browse/HBASE-14790
> Project: HBase
> Issue Type: Improvement
> Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all
> purposes. But in fact, we do not need most of the features if we only want to
> log WAL. For example, we do not need pipeline recovery since we could just
> close the old logger and open a new one. And also, we do not need to write
> multiple blocks since we could also open a new logger if the old file is too
> large.
> And the most important thing is that, it is hard to handle all the corner
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when
> using original DFSOutputStream due to its complicated logic. And the
> complicated logic also force us to use some magical tricks to increase
> performance. For example, we need to use multiple threads to call {{hflush}}
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when
> logging WAL. For correctness, and also for performance.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)