[ 
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035230#comment-15035230
 ] 

Duo Zhang commented on HBASE-14790:
-----------------------------------

basically works. I added a test for it

https://github.com/Apache9/hbase/blob/HBASE-14790/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestFanOutOneBlockDFSOutputStream.java

{quote}
So you "fanout" for every packet with multiple threads. 
{quote}
No, I use only one {{EventLoop}} which means there is only one thread.

{quote}
Simply closing the file without bumping GS will cause data corruption.
{quote}
This does not make sense. What if client crashes before bumping GS?

{quote}
then I think we should still separate OutputStream and DataStreamer logics
{quote}
Maybe you are right but I', not sure what is the correct way since there is no 
netty based DFSClient yet? I need to find a way to make it basically work, and 
then try to abstract the logic.

Next I will dig into the error handling part. Thanks.



> Implement a new DFSOutputStream for logging WAL only
> ----------------------------------------------------
>
>                 Key: HBASE-14790
>                 URL: https://issues.apache.org/jira/browse/HBASE-14790
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all 
> purposes. But in fact, we do not need most of the features if we only want to 
> log WAL. For example, we do not need pipeline recovery since we could just 
> close the old logger and open a new one. And also, we do not need to write 
> multiple blocks since we could also open a new logger if the old file is too 
> large.
> And the most important thing is that, it is hard to handle all the corner 
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when 
> using original DFSOutputStream due to its complicated logic. And the 
> complicated logic also force us to use some magical tricks to increase 
> performance. For example, we need to use multiple threads to call {{hflush}} 
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when 
> logging WAL. For correctness, and also for performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to