[
https://issues.apache.org/jira/browse/ZOOKEEPER-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ZOOKEEPER-5012:
--------------------------------------
Labels: pull-request-available (was: )
> NPE in QuorumPeer shutdown under restart
> ----------------------------------------
>
> Key: ZOOKEEPER-5012
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-5012
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum, server
> Affects Versions: 3.9.4
> Reporter: ConfX
> Priority: Critical
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The `QuorumPeer.shutdown()` method throws a `NullPointerException` when
> called on a QuorumPeer instance that hasn't fully initialized its `zkDb`
> field. This can happen during restart scenarios when a server is being
> shutdown before it finishes initializing.
>
> {*}File:{*}`zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java`
> {code:java}
> public void shutdown() {
> running = false;
> x509Util.close();
> if (leader != null) { // Line 1625 - NULL CHECK EXISTS
> leader.shutdown("quorum Peer shutdown");
> }
> if (follower != null) { // Line 1628 - NULL CHECK EXISTS
> follower.shutdown();
> }
> shutdownServerCnxnFactory();
> if (udpSocket != null) { // Line 1632 - NULL CHECK EXISTS
> udpSocket.close();
> }
> if (jvmPauseMonitor != null) { // Line 1635 - NULL CHECK EXISTS
> jvmPauseMonitor.serviceStop();
> } try {
> adminServer.shutdown();
> } catch (AdminServerException e) {
> LOG.warn("Problem stopping AdminServer", e);
> } if (getElectionAlg() != null) { // Line 1645 - NULL CHECK EXISTS
> this.interrupt();
> getElectionAlg().shutdown();
> }
> try {
> zkDb.close(); // Line 1650 - NO NULL CHECK! BUG HERE
> } catch (IOException ie) {
> LOG.warn("Error closing logs ", ie);
> }
> } {code}
> h2. *Why zkDb Can Be Null*
> 1. Default Constructor Does NOT Initialize zkDb:
> {code:java}
> // QuorumPeer.java line 1069
> public QuorumPeer() throws SaslException {
> super("QuorumPeer");
> quorumStats = new QuorumStats(this);
> jmxRemotePeerBean = new HashMap<>();
> adminServer = AdminServerFactory.createAdminServer();
> x509Util = createX509Util();
> initialize();
> reconfigEnabled = QuorumPeerConfig.isReconfigEnabled();
> // NOTE: zkDb is NOT initialized here!
> } {code}
> 2. zkDb is Only Set Later in QuorumPeerMain.runFromConfig():
> {code:java}
> // QuorumPeerMain.java lines 177-193
> quorumPeer = getQuorumPeer(); // Creates QuorumPeer with default
> constructor
> quorumPeer.setTxnFactory(new FileTxnSnapLog(config.getDataLogDir(),
> config.getDataDir()));
> // ... many other setter calls ...
> quorumPeer.setZKDatabase(new ZKDatabase(quorumPeer.getTxnFactory())); //
> Line 193 - zkDb is set here {code}
> h2. The Race Condition Scenario
> 1. `MainThread.start()` creates a new `TestQPMain` and starts a new thread
> 2. The new thread calls `main.initializeAndRun(args)` -> `runFromConfig()`
> 3. In `runFromConfig()`, `quorumPeer = getQuorumPeer()` creates the
> QuorumPeer (zkDb is null at this point)
> 4. BEFORE line 193 `setZKDatabase()` is reached, if the node restart triggers
> a shutdown...
> 5. `TestQPMain.shutdown()` calls `QuorumBase.shutdown(quorumPeer)` which
> calls `quorumPeer.shutdown()`
> 6. Line 1650 `zkDb.close()` throws NullPointerException
> h2. Stacktrace
> {code:java}
> Caused by: java.lang.NullPointerException
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:1650)
> at org.apache.zookeeper.test.QuorumBase.shutdown(QuorumBase.java:509)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerTestBase$TestQPMain.shutdown(QuorumPeerTestBase.java:83)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerTestBase$MainThread.shutdown(QuorumPeerTestBase.java:353)
> at
> org.restarttest.adapter.zookeeper.MainThreadAdapter.restartMainThread(MainThreadAdapter.java:129)
> ... {code}
> h2. Proposed Fix
> Add a null check for `zkDb` before calling `close()`, similar to how other
> fields are handled in the same method:
> {code:java}
> // Current buggy code (line 1649-1653):
> try {
> zkDb.close();
> } catch (IOException ie) {
> LOG.warn("Error closing logs ", ie);
> }
> // Fixed code:
> if (zkDb != null) {
> try {
> zkDb.close();
> } catch (IOException ie) {
> LOG.warn("Error closing logs ", ie);
> }
> } {code}
> I'm happy to send a PR for this issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)