[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044616#comment-15044616
 ] 

He Liangliang commented on HBASE-14004:
---------------------------------------

Sorry for the late reply. What about this approach:
1. Keep the current synced sequence id in memory and make sure the replicator 
does not read over this id when replicating.
2. When rolling WAL, record the final id in zk replication queue and also mark 
the file as rolled in-memory so the local replicator can know this file is 
finished.
3. When the replication queue is failovered, the new replicator will check the 
recorded sequence id, if it's available, it's successfully rolled, otherwise, 
read until EOF. For the later case, we need make sure the edits after the last 
successful sync of that log must be replayed to ensure consistency. Recording 
the last successful synced sequence id when flushing can guarantee this.

There overhead is insignificant (just a memory barrier for the volatile 
sequence id passing to the replicator).

Guess this may be the original design since there is TODO comments in 
FSHLog.java:
TODO:
 * replication may pick up these last edits though they have been marked as 
failed append (Need to
 * keep our own file lengths, not rely on HDFS).

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-14004
>                 URL: https://issues.apache.org/jira/browse/HBASE-14004
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: He Liangliang
>            Priority: Critical
>              Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to