[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15042694#comment-15042694
 ] 

Phil Yang commented on HBASE-14004:
-----------------------------------

After discussion in HBASE-14790 , we can move forward now. Let me repost my 
comment in HBASE-14790 first :)
{quote}
Currently there are two scenarios which may result in inconsistency between two 
clusters.

The first is master cluster crashes(for example, power failure) or three DNs 
and RS crash at the same time and we lost all data that is not flushed to DNs' 
disks but the data have been already synced to slave cluster.

The second is we will rollback memstore and response client an error if we get 
a error on hflush but the log may indeed exists in WAL. This will not only 
results in inconsistency between two clusters but also gives client a wrong 
response because the data will "revive" after replaying WAL. This scenario has 
been discussed in HBASE-14004

Comparing to the second, it is easier to solve the first scenario that we can 
tell ReplicationSource it can only read the logs that is already saved on three 
disks. We need to know the largest WAL entry id that has been synced. So HDFS's 
sync logic for itself may be not helpful for us and we must use hsync to let 
HBase know the entry id. So we need a configurable periodically hsync here, and 
if we have only one cluster it is also helpful to reduce data losses because of 
data center power failure or unluckily crashing three DNs and RS at the same 
time.

For the second scenario, it is more complex because we can not rollback 
memstore and tell client this operation failed unless we are very sure the data 
will never exist in WAL, and mostly we are not sure... So we have to use a new 
WAL logic that rewriting the entry to the new file rather than rollback. To 
implement this we need to handle duplicate entries while replaying WAL.
{quote}

Therefore, we may have 4 subtasks:
1: A configurable periodically hsync logic to make sure our data has been saved 
on disks. It is also helpful for single cluster mode.
2: ReplicationSource should only read WAL that is hsynced to prevent slave 
cluster having data that master losses.
3: WAL reader can handle duplicate entries, in other words, make WAL logging 
idempotent. 
4: Fixing HBase writing path that we should retry logging WAL in a new file 
rather than rollback MemStore.

Thoughts?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-14004
>                 URL: https://issues.apache.org/jira/browse/HBASE-14004
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: He Liangliang
>            Priority: Critical
>              Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to