[
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044319#comment-15044319
]
Duo Zhang commented on HBASE-14004:
-----------------------------------
{quote}
Is it means some entries may be already in place (you have told to client your
mutation successed and data was really in place on DN already) will lost.
{quote}
If all 3 DNs are all crashed and the RS is also crashed then yes, you have no
choice unless hsync every time to prevent this. And this does not related to
the scenario you said I think, we do not return SUCESS to client until we got a
successful hflush response.
And what [~liyu] said is another problem. Flush WAL async(or even do not write
WAL) is known to have the risk of losing data. We need to determine what we can
guarantee when user use these configurations.
> [Replication] Inconsistency between Memstore and WAL may result in data in
> remote cluster that is not in the origin
> -------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: He Liangliang
> Priority: Critical
> Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between
> memstore/hfile and WAL which cause the slave cluster has more data than the
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already
> (may partially) transported to the DNs which finally get persisted. As a
> result, the handler will rollback the Memstore and the later flushed HFile
> will also skip this record.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)