Nice work on this root cause analysis!!



On Sun, Oct 27, 2019, 6:07 PM Fangmin Lv <[email protected]> wrote:

> Hi everyone,
>
> Sharing the following informations in ZK dev as well for more visibility.
>
> Spent most of the time in the last few weeks on investigating the root
> cause, sorry if I haven't responed in time in the open source community,
> will catch up the PRs opened recently.
>
> ---------- Forwarded message ---------
> From: Fangmin Lv <[email protected]>
> Date: Sun, Oct 27, 2019 at 6:02 PM
> Subject: Weird inconsistency bugs we saw recently with ZK
> To: <[email protected]>
>
>
> Hey everyone,
>
> I'd like to share some weird inconsistency bugs we saw recently on prod,
> the root cause and potential fixes of it. It took us around a month to
> investigate, reproduce and find out the root cause, hopefully the
> informations here will help people avoid hitting this same potential issue.
>
> [Trigger conditions and behavior]
>
> The inconsistency issue only happened when running ZK with OpenJDK 10 on
> SKL machines, and it's not because of bugs inside ZK but due to a
> macro-assembly bug inside JDK.
>
> And the behavior of the issues might be:
>
> * NONODE returned when getData from a child exist when queried with
> getChildren, and there is no delete issued
> * NONODE error returned when try to create a child based on the parent node
> just successfully created, and there is no delete issued
> * No client is able to acquire the lock even though the previous session
> who hold the lock already dead
>
> [Root cause]
>
> The direct cause of the misbehavior above is due to the key/value put into
> the ZooKeeperServer.outstandingChangesForPath HashMap or the
> DataNode.children HashSet are not visible to the future get or remove,
> which caused the outstanding changes not visible when leader prepare the
> following txns, or node being deleted but not removed from
> DataNode.children.
>
> And the 'bad' HashMap/HashSet behavior is not because of concurrency bugs
> inside ZK, but due to a macro-assembly bug which is used to generate the
> String.equals intrinsic assembly code in JDK 9 and 10. The bug was
> introduced in JDK-8144771 when adding AVX-512 instructions support in JDK
> to optimize the String.equals intrinsic performance with 512 bit vector op
> support. Due to the bug, the String.equals method may return false result
> when using high band of CPU register (xmm16 - xmm31) with non-empty stack
> on SKL machines where AVX-512 is available.
>
> The macro-assembly bug we hit is in vptest which is used in the
> string_compare macro assembly code
> <
> http://hg.openjdk.java.net/jdk/jdk10/file/b09e56145e11/src/hotspot/cpu/x86/macroAssembler_x86.cpp#l4933
> >.
> It uses add/sub instruction when saving/resuming register values
> temporarily from stack, which will affect and distort the ZF (zero flag) in
> FLAGS register from the previous test instruction.
>
> For our case, if the key exist in the DataNode.children HashSet, the test
> instruction result will be zero, ZF bit will be set to 1, if the RSP value
> is not 0 (e.g stack is not empty) after addptr code here, then the ZF bit
> will be changed to 0, so String.equals compare during removeNode will
> return false result, and the key won't be removed.
>
> There is bug reported in JDK-8207746, the behavior is different, we've
> confirmed the issue by adding assembly code to log the issue in JDK 10.
>
> [Solutions]
>
> The possible mitigations are:
>
> 1. Disabling the AVX-512 with JVM option -XX:UseAVX=2
> 2. Using OpenJDK version higher than 10, which has fixed the issue in
> JDK-8207746
>
> Upgrading to OpenJDK 11+ is a better option, since 10 is not well
> supported, and AVX-512 do helps improving performance.
>
> We use JDK 10 due to SSL quorum socket close stall issue mentioned in
> ZOOKEEPER-3384 <https://issues.apache.org/jira/browse/ZOOKEEPER-3384>, and
> the SO_LINGER option is not honored in JDK 11. We've unblocked JDK 11 by
> asynchronously closing the quorum socket, and we're upstreaming that in
> ZOOKEEPER-3574 <https://issues.apache.org/jira/browse/ZOOKEEPER-3574>.
>
> Thanks,
> Fangmin
>

Reply via email to