[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044616#comment-15044616 ]
He Liangliang commented on HBASE-14004: --------------------------------------- Sorry for the late reply. What about this approach: 1. Keep the current synced sequence id in memory and make sure the replicator does not read over this id when replicating. 2. When rolling WAL, record the final id in zk replication queue and also mark the file as rolled in-memory so the local replicator can know this file is finished. 3. When the replication queue is failovered, the new replicator will check the recorded sequence id, if it's available, it's successfully rolled, otherwise, read until EOF. For the later case, we need make sure the edits after the last successful sync of that log must be replayed to ensure consistency. Recording the last successful synced sequence id when flushing can guarantee this. There overhead is insignificant (just a memory barrier for the volatile sequence id passing to the replicator). Guess this may be the original design since there is TODO comments in FSHLog.java: TODO: * replication may pick up these last edits though they have been marked as failed append (Need to * keep our own file lengths, not rely on HDFS). > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > ------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver > Reporter: He Liangliang > Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)