adoroszlai opened a new pull request #581:
URL: https://github.com/apache/ratis/pull/581
## What changes were proposed in this pull request?
`RaftServerImpl` creates a new `FollowerState` for each vote request
received. The old `FollowerState` may trigger change to candidate role even
after it was shutdown by the server. Both `RaftServerImpl#requestVote` and the
change to candidate by `FollowerState` requires lock on `server`
(`RaftServerImpl` object). The order of events which triggers the problem:
* `RaftServerImpl` executing `requestVote` [with lock on
itself](https://github.com/apache/ratis/blob/b57be797fe97c00abc0e3d7353d30e1b2961714d/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1099)
* `FollowerState` [checks conditions, then waits for lock on
`server`](https://github.com/apache/ratis/blob/b57be797fe97c00abc0e3d7353d30e1b2961714d/ratis-server/src/main/java/org/apache/ratis/server/impl/FollowerState.java#L126-L130)
* `RaftServerImpl` [shuts down old `FollowerState`, starts new
one](https://github.com/apache/ratis/blob/b57be797fe97c00abc0e3d7353d30e1b2961714d/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L475-L477)
* `RaftServerImpl` [sets last RPC time on new
`FollowerState`](https://github.com/apache/ratis/blob/b57be797fe97c00abc0e3d7353d30e1b2961714d/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1116-L1117)
* `RaftServerImpl` [releases
lock](https://github.com/apache/ratis/blob/b57be797fe97c00abc0e3d7353d30e1b2961714d/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1127)
* *old* `FollowerState` [enters block with lock on `server`, triggers
change to
candidate](https://github.com/apache/ratis/blob/b57be797fe97c00abc0e3d7353d30e1b2961714d/ratis-server/src/main/java/org/apache/ratis/server/impl/FollowerState.java#L130-L138)
Let `FollowerState` check if it is still running after entering the
`synchronized` block.
https://issues.apache.org/jira/browse/RATIS-1490
## How was this patch tested?
Tweaked `FollowerState`, `RaftServerImpl` and `LeaderElectionTests` with
code injection. It required too much test-specific code, even in
`synchronized` blocks, so I think it should not be part of this patch. Ideas
for unit tests welcome.
Regular CI:
https://github.com/adoroszlai/incubator-ratis/runs/4845475132
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]