[
https://issues.apache.org/jira/browse/ZOOKEEPER-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290405#comment-16290405
]
ASF GitHub Bot commented on ZOOKEEPER-2789:
-------------------------------------------
Github user asdf2014 commented on the issue:
https://github.com/apache/zookeeper/pull/262
Hi, @phunt . Indeed, the `FastLeaderElection` algorithm is very efficient.
Most of the leader election situation would finished in hundreds milliseconds.
However, some real-time stream frameworks suck as Apache Kafka and Apache Storm
etc, could make lots of pressures into Zookeeper cluster when they carry on too
many business data or processing logic. So maybe, the leader election will be
triggered very frequently and the process becomes time consuming.
> Reassign `ZXID` for solving 32bit overflow problem
> --------------------------------------------------
>
> Key: ZOOKEEPER-2789
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2789
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.5.3
> Reporter: Benedict Jin
> Assignee: Benedict Jin
> Fix For: 3.6.0
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> If it is `1k/s` ops, then as long as $2^32 / (86400 * 1000) \approx 49.7$
> days ZXID will exhausted. But, if we reassign the `ZXID` into 16bit for
> `epoch` and 48bit for `counter`, then the problem will not occur until after
> $Math.min(2^16 / 365, 2^48 / (86400 * 1000 * 365)) \approx Math.min(179.6,
> 8925.5) = 179.6$ years.
> However, i thought the ZXID is `long` type, reading and writing the long type
> (and `double` type the same) in JVM, is divided into high 32bit and low 32bit
> part of the operation, and because the `ZXID` variable is not modified with
> `volatile` and is not boxed for the corresponding reference type (`Long` /
> `Double`), so it belongs to [non-atomic operation]
> (https://docs.oracle.com/javase/specs/jls/se8 /html/jls-17.html#jls-17.7).
> Thus, if the lower 32 bits of the upper 32 bits are divided into the entire
> 32 bits of the `long`, there may be a concurrent problem.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)