[
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048137#comment-15048137
]
Phil Yang commented on HBASE-14004:
-----------------------------------
{quote}
And in this issue, i suppose we just ensure correctness with hflush, wdyt?
{quote}
Agree :) , we can focus on the inconsistency between WAL and Memstore which
also results in inconsistency between master and slave. And if we don't use
hsync, I think we need not change the logic of replicator which means we
needn't only transfer data that is hflushed because all entries in WAL should
finally be in memstore, right?
So we may have only two subtasks now:
1: WAL reader can handle duplicate entries, in other words, make WAL logging
idempotent.
2: WAL logger does not throw exception if it can not make sure whether the
entry is saved on HDFS or not(for example, hflush timeout), it retry to write
entry to a new file and close old file asynchronously
I change the description of the second subtask because we needn't care the
logic of HRegion, which is now "write memstore->write wal->rollback if wal
fail" and may be changed to "write wal->write memstore if wal succeed"
> [Replication] Inconsistency between Memstore and WAL may result in data in
> remote cluster that is not in the origin
> -------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: He Liangliang
> Priority: Critical
> Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between
> memstore/hfile and WAL which cause the slave cluster has more data than the
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already
> (may partially) transported to the DNs which finally get persisted. As a
> result, the handler will rollback the Memstore and the later flushed HFile
> will also skip this record.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)