[
https://issues.apache.org/jira/browse/CASSANDRA-15551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17080606#comment-17080606
]
Gianluca Righetto edited comment on CASSANDRA-15551 at 4/10/20, 4:11 PM:
-------------------------------------------------------------------------
The issue here is that once the this line is executed in MoveTest's @Before
method, {{StorageService.instance.getTokenMetadata().clearUnsafe()}}, the
{{GossipStage}} thread kicks in and starts evicting the stale endpoints from
membership, which may happen in parallel while another test method is already
running.
To reproduce this in an IDE, you can set breakpoints at:
[https://github.com/apache/cassandra/blob/1ce3c1c039561c15892115af37e0c7abf260bc6b/test/unit/org/apache/cassandra/Util.java#L222]
and
[https://github.com/apache/cassandra/blob/1ce3c1c039561c15892115af37e0c7abf260bc6b/src/java/org/apache/cassandra/gms/Gossiper.java#L524]
If the main thread starts executing the second iteration of the loop in
{{createInitialRing}} while the GossipStage thread is removing the endpoints in
{{evictFromMembership}}, it will throw a NPE down the road.
The fix I submitted basically makes the main thread wait for all endpoints to
be evicted in between tests, such that the next test starts in a clean state.
Pull request: [https://github.com/apache/cassandra/pull/533]
Java 11 Unit Tests results: [https://circleci.com/gh/grighetto/cassandra/68]
Java 8 Unit Tests results: [https://circleci.com/gh/grighetto/cassandra/65]
was (Author: gianluca):
The issue here is that once the this line is executed in the @Before setup
method, {{StorageService.instance.getTokenMetadata().clearUnsafe()}}, the
{{GossipStage}} thread kicks in and starts evicting the stale endpoints from
membership, which may happen in parallel while another test method is already
running.
To reproduce this in an IDE, you can set breakpoints at:
https://github.com/apache/cassandra/blob/1ce3c1c039561c15892115af37e0c7abf260bc6b/test/unit/org/apache/cassandra/Util.java#L222
and
https://github.com/apache/cassandra/blob/1ce3c1c039561c15892115af37e0c7abf260bc6b/src/java/org/apache/cassandra/gms/Gossiper.java#L524
If the main thread starts executing the second iteration of the loop in
{{createInitialRing}} while the GossipStage thread is removing the endpoints in
{{evictFromMembership}}, it will throw a NPE down the road.
The fix I submitted basically makes the main thread wait for all endpoints to
be evicted in between tests, such that the next test starts in a clean state.
Pull request: https://github.com/apache/cassandra/pull/533
Java 11 Unit Tests results: https://circleci.com/gh/grighetto/cassandra/68
Java 8 Unit Tests results: https://circleci.com/gh/grighetto/cassandra/65
> Fix flaky tests org.apache.cassandra.service.MoveTest testStateJumpToNormal
> and testMoveWithPendingRangesNetworkStrategyRackAwareThirtyNodes
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-15551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15551
> Project: Cassandra
> Issue Type: Bug
> Components: Test/unit
> Reporter: David Capwell
> Assignee: Gianluca Righetto
> Priority: Normal
> Labels: pull-request-available
> Fix For: 4.0-alpha
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> testStateJumpToNormal failure was on java 11
> {code}
> java.lang.NullPointerException
> at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:1028)
> at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:1023)
> at
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2513)
> at
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:2055)
> at org.apache.cassandra.Util.createInitialRing(Util.java:225)
> at
> org.apache.cassandra.service.MoveTest.testStateJumpToNormal(MoveTest.java:935)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}
> testMoveWithPendingRangesNetworkStrategyRackAwareThirtyNodes failure was on
> java 8
> {code}
> java.lang.NullPointerException
> at
> org.apache.cassandra.service.StorageService.updatePeerInfo(StorageService.java:2174)
> at
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2511)
> at
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:2055)
> at org.apache.cassandra.Util.createInitialRing(Util.java:225)
> at
> org.apache.cassandra.service.MoveTest.testMoveWithPendingRangesNetworkStrategyRackAwareThirtyNodes(MoveTest.java:199)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]