[ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003595#comment-15003595 ]
Duo Zhang commented on HBASE-14790: ----------------------------------- HTTP/2 has its own problems that we haven't finish the read path yet but the write protocol is much more complex than read protocol... And also, if we plan to do it in HDFS, I think we should make it more general, and it is better to design a good event-driven FileSystem interface at the beginning. I do not think either of them is easy... So my plan is to implement a simple version in HBase which only compatible with hadoop 2.x first, make sure it has some benefits and actually ship it with HBase. And then, we could start implementing a more general and more powerful event-driven FileSystem in HDFS. When the new FileSystem is out, we could move HBase to use the new FileSystem in HDFS and drop the old simple version. What do you think? [~wheat9] Thanks. > Implement a new DFSOutputStream for logging WAL only > ---------------------------------------------------- > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement > Reporter: Duo Zhang > > The original {{DFSOutputStream}} is very powerful and aims to serve all > purposes. But in fact, we do not need most of the features if we only want to > log WAL. For example, we do not need pipeline recovery since we could just > close the old logger and open a new one. And also, we do not need to write > multiple blocks since we could also open a new logger if the old file is too > large. > And the most important thing is that, it is hard to handle all the corner > cases to avoid data loss or data inconsistency(such as HBASE-14004) when > using original DFSOutputStream due to its complicated logic. And the > complicated logic also force us to use some magical tricks to increase > performance. For example, we need to use multiple threads to call {{hflush}} > when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when > logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)