Duo Zhang created HBASE-14790:
---------------------------------
Summary: Implement a new DFSOutputStream for logging WAL only
Key: HBASE-14790
URL: https://issues.apache.org/jira/browse/HBASE-14790
Project: HBase
Issue Type: Improvement
Reporter: Duo Zhang
The original {{DFSOutputStream}} is very powerful and aims to serve all
purposes. But in fact, we do not need most of the features if we only want to
log WAL. For example, we do not need pipeline recovery since we could just
close the old logger and open a new one. And also, we do not need to write
multiple blocks since we could also open a new logger if the old file is too
large.
And the most important thing is that, it is hard to handle all the corner cases
to avoid data loss or data inconsistency(such as HBASE-14004) when using
original DFSOutputStream due to its complicated logic. And the complicated
logic also force us to use some magical tricks to increase performance. For
example, we need to use multiple threads to call {{hflush}} when logging, and
now we use 5 threads. But why 5 not 10 or 100?
So here, I propose we should implement our own {{DFSOutputStream}} when logging
WAL. For correctness, and also for performance.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)