[
https://issues.apache.org/jira/browse/ZOOKEEPER-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051530#comment-16051530
]
ASF GitHub Bot commented on ZOOKEEPER-2789:
-------------------------------------------
Github user yunfan123 commented on the issue:
https://github.com/apache/zookeeper/pull/262
Hi, @asdf2014
In most cases, I don't think the epoch can overflow 16-bit.
In general, zookeeper leader election is very rare, and it may take several
seconds even several minutes to finish leader election.
And zookeeper is totally unavailable during leader election.
If the zookeeper that you use can overflow 16-bits, it turns out the
zookeeper you used is totally unreliable.
Finally, compatible with old version is really important.
If not compatible with old versions, I must restart all my zookeeper nodes.
All of nodes need reload snapshot and log from disk, it will cost a lot of
time.
I believe this upgrade process is unacceptable by most zookeeper users.
> Reassign `ZXID` for solving 32bit overflow problem
> --------------------------------------------------
>
> Key: ZOOKEEPER-2789
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2789
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.5.3
> Reporter: Benedict Jin
> Assignee: Benedict Jin
> Fix For: 3.6.0
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> If it is `1k/s` ops, then as long as $2^32 / (86400 * 1000) \approx 49.7$
> days ZXID will exhausted. But, if we reassign the `ZXID` into 16bit for
> `epoch` and 48bit for `counter`, then the problem will not occur until after
> $Math.min(2^16 / 365, 2^48 / (86400 * 1000 * 365)) \approx Math.min(179.6,
> 8925.5) = 179.6$ years.
> However, i thought the ZXID is `long` type, reading and writing the long type
> (and `double` type the same) in JVM, is divided into high 32bit and low 32bit
> part of the operation, and because the `ZXID` variable is not modified with
> `volatile` and is not boxed for the corresponding reference type (`Long` /
> `Double`), so it belongs to [non-atomic operation]
> (https://docs.oracle.com/javase/specs/jls/se8 /html/jls-17.html#jls-17.7).
> Thus, if the lower 32 bits of the upper 32 bits are divided into the entire
> 32 bits of the `long`, there may be a concurrent problem.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)