[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only

Phil Yang (JIRA) Fri, 04 Dec 2015 10:25:32 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041910#comment-15041910
 ]


Phil Yang commented on HBASE-14790:
-----------------------------------

Currently there are two scenarios which may result in inconsistency between two 
clusters.

The first is master cluster crashes(for example, power failure) or three DNs 
and RS crash at the same time and we lost all data that is not flushed to DNs' 
disks but the data have been already synced to slave cluster.

The second is we will rollback memstore and response client an error if we get 
a error on hflush but the log may indeed exists in WAL. This will not only 
results in inconsistency between two clusters but also gives client a wrong 
response because the data will "revive" after replaying WAL. This scenario has 
been discussed in HBASE-14004 


Comparing to the second, it is easier to solve the first scenario that we can 
tell ReplicationSource it can only read the logs that is already saved on three 
disks. We need to know the largest WAL entry id that has been synced. So HDFS's 
sync logic for itself may be not useful for us and we must use hsync to let 
HBase know the entry id. So we need a configurable periodically hsync here, and 
if we have only one cluster it is also helpful to reduce data losses because of 
data center power failure or unluckily crashing three DNs and RS at the same 
time. I think this work can be done without the new DFSOutputStream?

For the second scenario, it is more complex because we can not rollback 
memstore and tell client this operation failed unless we are very sure the data 
will never exist in WAL, and mostly we are not sure... So we have to use a new 
WAL logic that rewriting the entry to the new file rather than rollback. To 
implement this we need to handle duplicate entries while replaying WAL. I think 
this logic is not conflicting with pipeline DFSOutputStream so actually we can 
fix it on currently WAL implementation?

And this issue HBASE-14790 may be only a performance improvement work that will 
not fix any bugs? Of course, the FanOutOneBlockDFSOutputStream should implement 
the new WAL logic directly.

[~Apache9] What do you think?


> Implement a new DFSOutputStream for logging WAL only
> ----------------------------------------------------
>
>                 Key: HBASE-14790
>                 URL: https://issues.apache.org/jira/browse/HBASE-14790
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all 
> purposes. But in fact, we do not need most of the features if we only want to 
> log WAL. For example, we do not need pipeline recovery since we could just 
> close the old logger and open a new one. And also, we do not need to write 
> multiple blocks since we could also open a new logger if the old file is too 
> large.
> And the most important thing is that, it is hard to handle all the corner 
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when 
> using original DFSOutputStream due to its complicated logic. And the 
> complicated logic also force us to use some magical tricks to increase 
> performance. For example, we need to use multiple threads to call {{hflush}} 
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when 
> logging WAL. For correctness, and also for performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only

Reply via email to