[
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041910#comment-15041910
]
Phil Yang commented on HBASE-14790:
-----------------------------------
Currently there are two scenarios which may result in inconsistency between two
clusters.
The first is master cluster crashes(for example, power failure) or three DNs
and RS crash at the same time and we lost all data that is not flushed to DNs'
disks but the data have been already synced to slave cluster.
The second is we will rollback memstore and response client an error if we get
a error on hflush but the log may indeed exists in WAL. This will not only
results in inconsistency between two clusters but also gives client a wrong
response because the data will "revive" after replaying WAL. This scenario has
been discussed in HBASE-14004
Comparing to the second, it is easier to solve the first scenario that we can
tell ReplicationSource it can only read the logs that is already saved on three
disks. We need to know the largest WAL entry id that has been synced. So HDFS's
sync logic for itself may be not useful for us and we must use hsync to let
HBase know the entry id. So we need a configurable periodically hsync here, and
if we have only one cluster it is also helpful to reduce data losses because of
data center power failure or unluckily crashing three DNs and RS at the same
time. I think this work can be done without the new DFSOutputStream?
For the second scenario, it is more complex because we can not rollback
memstore and tell client this operation failed unless we are very sure the data
will never exist in WAL, and mostly we are not sure... So we have to use a new
WAL logic that rewriting the entry to the new file rather than rollback. To
implement this we need to handle duplicate entries while replaying WAL. I think
this logic is not conflicting with pipeline DFSOutputStream so actually we can
fix it on currently WAL implementation?
And this issue HBASE-14790 may be only a performance improvement work that will
not fix any bugs? Of course, the FanOutOneBlockDFSOutputStream should implement
the new WAL logic directly.
[~Apache9] What do you think?
> Implement a new DFSOutputStream for logging WAL only
> ----------------------------------------------------
>
> Key: HBASE-14790
> URL: https://issues.apache.org/jira/browse/HBASE-14790
> Project: HBase
> Issue Type: Improvement
> Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all
> purposes. But in fact, we do not need most of the features if we only want to
> log WAL. For example, we do not need pipeline recovery since we could just
> close the old logger and open a new one. And also, we do not need to write
> multiple blocks since we could also open a new logger if the old file is too
> large.
> And the most important thing is that, it is hard to handle all the corner
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when
> using original DFSOutputStream due to its complicated logic. And the
> complicated logic also force us to use some magical tricks to increase
> performance. For example, we need to use multiple threads to call {{hflush}}
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when
> logging WAL. For correctness, and also for performance.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)