[ 
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003595#comment-15003595
 ] 

Duo Zhang commented on HBASE-14790:
-----------------------------------

HTTP/2 has its own problems that we haven't finish the read path yet but the 
write protocol is much more complex than read protocol... And also, if we plan 
to do it in HDFS, I think we should make it more general, and it is better to 
design a good event-driven FileSystem interface at the beginning. I do not 
think either of them is easy...

So my plan is to implement a simple version in HBase which only compatible with 
hadoop 2.x first, make sure it has some benefits and actually ship it with 
HBase. And then, we could start implementing a more general and more powerful 
event-driven FileSystem in HDFS. When the new FileSystem is out, we could move 
HBase to use the new FileSystem in HDFS and drop the old simple version.

What do you think? [~wheat9]

Thanks.

> Implement a new DFSOutputStream for logging WAL only
> ----------------------------------------------------
>
>                 Key: HBASE-14790
>                 URL: https://issues.apache.org/jira/browse/HBASE-14790
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all 
> purposes. But in fact, we do not need most of the features if we only want to 
> log WAL. For example, we do not need pipeline recovery since we could just 
> close the old logger and open a new one. And also, we do not need to write 
> multiple blocks since we could also open a new logger if the old file is too 
> large.
> And the most important thing is that, it is hard to handle all the corner 
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when 
> using original DFSOutputStream due to its complicated logic. And the 
> complicated logic also force us to use some magical tricks to increase 
> performance. For example, we need to use multiple threads to call {{hflush}} 
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when 
> logging WAL. For correctness, and also for performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to