[
https://issues.apache.org/jira/browse/HBASE-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeffrey Zhong updated HBASE-8208:
---------------------------------
Attachment: hbase-8208.patch
{quote}
So, should we just call sync() in FSLog.startCacheFlush() regardless of the
replication state? It seems harmless.
{quote}
That's a good idea. I put sync() inside function internalFlushcache instead of
FSHLog.startCacheFlush() because the function is wrapped under
updatesLock.writelock while the wal.sync seems not need the lock. I put sync()
before {code}mvcc.waitForRead(w);{code} to hopefully take some advantage of the
wait.
I also moved the check {code}txid <= this.syncedTillHere{code} to the beginning
the function syncer(long txid) so it may skip some acquiring of this.updateLock.
Thanks,
-Jeffrey
> Data could not be replicated to slaves when deferredLogSync is enabled
> ----------------------------------------------------------------------
>
> Key: HBASE-8208
> URL: https://issues.apache.org/jira/browse/HBASE-8208
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.95.0, 0.98.0, 0.94.6
> Reporter: Jeffrey Zhong
> Fix For: 0.95.0, 0.98.0, 0.94.7
>
> Attachments: hbase-8208.patch
>
>
> This is a subtle issue. When deferredLogSync is enabled, there are chances we
> could flush data before syncing all HLog entries. Assuming we just flush the
> internal cache and the server dies with some unsynced hlog entries.
> Data is not lost at the source cluster while replication is based on WAL
> files and some changes we flushed at the source won't be replicated the slave
> clusters.
> Although enabling deferredLogSync with tolerances of data loss, it breaks the
> replication assumption that whatever persisted in the source should be
> replicated to its slave clusters.
> In short, the slave cluster could end up with double losses: the data loss in
> the source and some data stored in source cluster may not be replicated to
> slaves either.
> The fix of the issue isn't hard. Basically we can invoke sync during each
> flush when replication is enabled for a region server. Since sync returns
> immediately when nothing to sync so there should be no performance impact.
> Please let me know what you think!
> Thanks,
> -Jeffrey
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira