ConfX created ZOOKEEPER-5012:
--------------------------------

             Summary: NPE in QuorumPeer shutdown under restart
                 Key: ZOOKEEPER-5012
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-5012
             Project: ZooKeeper
          Issue Type: Bug
          Components: quorum, server
    Affects Versions: 3.9.4
            Reporter: ConfX


The `QuorumPeer.shutdown()` method throws a `NullPointerException` when called 
on a QuorumPeer instance that hasn't fully initialized its `zkDb` field. This 
can happen during restart scenarios when a server is being shutdown before it 
finishes initializing.
 
{*}File:{*}`zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java`
{code:java}
public void shutdown() {
    running = false;
    x509Util.close();
    if (leader != null) {           // Line 1625 - NULL CHECK EXISTS
        leader.shutdown("quorum Peer shutdown");
    }
    if (follower != null) {         // Line 1628 - NULL CHECK EXISTS
        follower.shutdown();
    }
    shutdownServerCnxnFactory();
    if (udpSocket != null) {        // Line 1632 - NULL CHECK EXISTS
        udpSocket.close();
    }
    if (jvmPauseMonitor != null) {  // Line 1635 - NULL CHECK EXISTS
        jvmPauseMonitor.serviceStop();
    }    try {
        adminServer.shutdown();
    } catch (AdminServerException e) {
        LOG.warn("Problem stopping AdminServer", e);
    }    if (getElectionAlg() != null) { // Line 1645 - NULL CHECK EXISTS
        this.interrupt();
        getElectionAlg().shutdown();
    }
    try {
        zkDb.close();               // Line 1650 - NO NULL CHECK! BUG HERE
    } catch (IOException ie) {
        LOG.warn("Error closing logs ", ie);
    }
} {code}
h2. *Why zkDb Can Be Null*
1. Default Constructor Does NOT Initialize zkDb:
{code:java}
   // QuorumPeer.java line 1069
   public QuorumPeer() throws SaslException {
       super("QuorumPeer");
       quorumStats = new QuorumStats(this);
       jmxRemotePeerBean = new HashMap<>();
       adminServer = AdminServerFactory.createAdminServer();
       x509Util = createX509Util();
       initialize();
       reconfigEnabled = QuorumPeerConfig.isReconfigEnabled();
       // NOTE: zkDb is NOT initialized here!
   } {code}
2. zkDb is Only Set Later in QuorumPeerMain.runFromConfig():
{code:java}
   // QuorumPeerMain.java lines 177-193
   quorumPeer = getQuorumPeer();  // Creates QuorumPeer with default constructor
   quorumPeer.setTxnFactory(new FileTxnSnapLog(config.getDataLogDir(), 
config.getDataDir()));
   // ... many other setter calls ...
   quorumPeer.setZKDatabase(new ZKDatabase(quorumPeer.getTxnFactory())); // 
Line 193 - zkDb is set here {code}
h2. The Race Condition Scenario
1. `MainThread.start()` creates a new `TestQPMain` and starts a new thread
2. The new thread calls `main.initializeAndRun(args)` -> `runFromConfig()`
3. In `runFromConfig()`, `quorumPeer = getQuorumPeer()` creates the QuorumPeer 
(**zkDb is null at this point**)
4. **BEFORE** line 193 `setZKDatabase()` is reached, if the restart framework 
triggers a shutdown...
5. `TestQPMain.shutdown()` calls `QuorumBase.shutdown(quorumPeer)` which calls 
`quorumPeer.shutdown()`
6. Line 1650 `zkDb.close()` throws NullPointerException
h2. Stacktrace
{code:java}
Caused by: java.lang.NullPointerException
    at 
org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:1650)
    at org.apache.zookeeper.test.QuorumBase.shutdown(QuorumBase.java:509)
    at 
org.apache.zookeeper.server.quorum.QuorumPeerTestBase$TestQPMain.shutdown(QuorumPeerTestBase.java:83)
    at 
org.apache.zookeeper.server.quorum.QuorumPeerTestBase$MainThread.shutdown(QuorumPeerTestBase.java:353)
    at 
org.restarttest.adapter.zookeeper.MainThreadAdapter.restartMainThread(MainThreadAdapter.java:129)
    ... {code}
h2. Proposed Fix
Add a null check for `zkDb` before calling `close()`, similar to how other 
fields are handled in the same method:
{code:java}
// Current buggy code (line 1649-1653):
try {
    zkDb.close();
} catch (IOException ie) {
    LOG.warn("Error closing logs ", ie);
}

// Fixed code:
if (zkDb != null) {
    try {
        zkDb.close();
    } catch (IOException ie) {
        LOG.warn("Error closing logs ", ie);
    }
} {code}
I'm happy to send a PR for this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to