[
https://issues.apache.org/jira/browse/ZOOKEEPER-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291100#comment-16291100
]
ASF GitHub Bot commented on ZOOKEEPER-2789:
-------------------------------------------
Github user breed commented on the issue:
https://github.com/apache/zookeeper/pull/262
i think it would be much better to extend ZOOKEEPER-1277 to more
transparently do the rollover without a full leader election.
the main issue i have with shortening the epoch size is that once the epoch
hits the maximum value the ensemble is stuck, nothing can proceed, so we really
need to keep the epoch size big enough that we would never hit that condition.
i don't think a 16-bit epoch satisfies that requirement.
> Reassign `ZXID` for solving 32bit overflow problem
> --------------------------------------------------
>
> Key: ZOOKEEPER-2789
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2789
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.5.3
> Reporter: Benedict Jin
> Assignee: Benedict Jin
> Fix For: 3.6.0
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> If it is `1k/s` ops, then as long as $2^32 / (86400 * 1000) \approx 49.7$
> days ZXID will exhausted. But, if we reassign the `ZXID` into 16bit for
> `epoch` and 48bit for `counter`, then the problem will not occur until after
> $Math.min(2^16 / 365, 2^48 / (86400 * 1000 * 365)) \approx Math.min(179.6,
> 8925.5) = 179.6$ years.
> However, i thought the ZXID is `long` type, reading and writing the long type
> (and `double` type the same) in JVM, is divided into high 32bit and low 32bit
> part of the operation, and because the `ZXID` variable is not modified with
> `volatile` and is not boxed for the corresponding reference type (`Long` /
> `Double`), so it belongs to [non-atomic operation]
> (https://docs.oracle.com/javase/specs/jls/se8 /html/jls-17.html#jls-17.7).
> Thus, if the lower 32 bits of the upper 32 bits are divided into the entire
> 32 bits of the `long`, there may be a concurrent problem.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)