[
https://issues.apache.org/jira/browse/HBASE-24603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bharath Vissapragada updated HBASE-24603:
-----------------------------------------
Fix Version/s: 2.2.6
2.1.10
> Zookeeper sync() call is async
> ------------------------------
>
> Key: HBASE-24603
> URL: https://issues.apache.org/jira/browse/HBASE-24603
> Project: HBase
> Issue Type: Improvement
> Components: master, regionserver
> Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.7.0
> Reporter: Bharath Vissapragada
> Assignee: Bharath Vissapragada
> Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.3.0, 1.7.0, 2.1.10, 2.2.6
>
>
> Here is the method that does a sync() of lagging followers with leader in the
> quorum. We rely on this to see a consistent snapshot of ZK data from multiple
> clients. However the problem is that the underlying sync() call is actually
> asynchronous since we are passing a 'null' call back. See the ZK API
> [doc|https://zookeeper.apache.org/doc/r3.5.7/apidocs/zookeeper-server/index.html]
> for details. The end-result is that sync() doesn't guarantee that it has
> happened by the time it returns.
> {noformat}
> /**
> * Forces a synchronization of this ZooKeeper client connection.
> * <p>
> * Executing this method before running other methods will ensure that the
> * subsequent operations are up-to-date and consistent as of the time that
> * the sync is complete.
> * <p>
> * This is used for compareAndSwap type operations where we need to read the
> * data of an existing node and delete or transition that node, utilizing
> the
> * previously read version and data. We want to ensure that the version
> read
> * is up-to-date from when we begin the operation.
> */
> public void sync(String path) throws KeeperException {
> this.recoverableZooKeeper.sync(path, null, null);
> }
> {noformat}
> We rely on this heavily (at least in the older branches that do ZK based
> region assignment). In branch-1 we saw weird "BadVersionException" exceptions
> in RITs because of the inconsistent view of the ZK snapshot. It could
> manifest differently in other branches. Either way, this is something we need
> to fix.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)