[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-5012:
--------------------------------------
    Labels: pull-request-available  (was: )

> NPE in QuorumPeer shutdown under restart
> ----------------------------------------
>
>                 Key: ZOOKEEPER-5012
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-5012
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.9.4
>            Reporter: ConfX
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The `QuorumPeer.shutdown()` method throws a `NullPointerException` when 
> called on a QuorumPeer instance that hasn't fully initialized its `zkDb` 
> field. This can happen during restart scenarios when a server is being 
> shutdown before it finishes initializing.
>  
> {*}File:{*}`zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java`
> {code:java}
> public void shutdown() {
>     running = false;
>     x509Util.close();
>     if (leader != null) {           // Line 1625 - NULL CHECK EXISTS
>         leader.shutdown("quorum Peer shutdown");
>     }
>     if (follower != null) {         // Line 1628 - NULL CHECK EXISTS
>         follower.shutdown();
>     }
>     shutdownServerCnxnFactory();
>     if (udpSocket != null) {        // Line 1632 - NULL CHECK EXISTS
>         udpSocket.close();
>     }
>     if (jvmPauseMonitor != null) {  // Line 1635 - NULL CHECK EXISTS
>         jvmPauseMonitor.serviceStop();
>     }    try {
>         adminServer.shutdown();
>     } catch (AdminServerException e) {
>         LOG.warn("Problem stopping AdminServer", e);
>     }    if (getElectionAlg() != null) { // Line 1645 - NULL CHECK EXISTS
>         this.interrupt();
>         getElectionAlg().shutdown();
>     }
>     try {
>         zkDb.close();               // Line 1650 - NO NULL CHECK! BUG HERE
>     } catch (IOException ie) {
>         LOG.warn("Error closing logs ", ie);
>     }
> } {code}
> h2. *Why zkDb Can Be Null*
> 1. Default Constructor Does NOT Initialize zkDb:
> {code:java}
>    // QuorumPeer.java line 1069
>    public QuorumPeer() throws SaslException {
>        super("QuorumPeer");
>        quorumStats = new QuorumStats(this);
>        jmxRemotePeerBean = new HashMap<>();
>        adminServer = AdminServerFactory.createAdminServer();
>        x509Util = createX509Util();
>        initialize();
>        reconfigEnabled = QuorumPeerConfig.isReconfigEnabled();
>        // NOTE: zkDb is NOT initialized here!
>    } {code}
> 2. zkDb is Only Set Later in QuorumPeerMain.runFromConfig():
> {code:java}
>    // QuorumPeerMain.java lines 177-193
>    quorumPeer = getQuorumPeer();  // Creates QuorumPeer with default 
> constructor
>    quorumPeer.setTxnFactory(new FileTxnSnapLog(config.getDataLogDir(), 
> config.getDataDir()));
>    // ... many other setter calls ...
>    quorumPeer.setZKDatabase(new ZKDatabase(quorumPeer.getTxnFactory())); // 
> Line 193 - zkDb is set here {code}
> h2. The Race Condition Scenario
> 1. `MainThread.start()` creates a new `TestQPMain` and starts a new thread
> 2. The new thread calls `main.initializeAndRun(args)` -> `runFromConfig()`
> 3. In `runFromConfig()`, `quorumPeer = getQuorumPeer()` creates the 
> QuorumPeer (zkDb is null at this point)
> 4. BEFORE line 193 `setZKDatabase()` is reached, if the node restart triggers 
> a shutdown...
> 5. `TestQPMain.shutdown()` calls `QuorumBase.shutdown(quorumPeer)` which 
> calls `quorumPeer.shutdown()`
> 6. Line 1650 `zkDb.close()` throws NullPointerException
> h2. Stacktrace
> {code:java}
> Caused by: java.lang.NullPointerException
>     at 
> org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:1650)
>     at org.apache.zookeeper.test.QuorumBase.shutdown(QuorumBase.java:509)
>     at 
> org.apache.zookeeper.server.quorum.QuorumPeerTestBase$TestQPMain.shutdown(QuorumPeerTestBase.java:83)
>     at 
> org.apache.zookeeper.server.quorum.QuorumPeerTestBase$MainThread.shutdown(QuorumPeerTestBase.java:353)
>     at 
> org.restarttest.adapter.zookeeper.MainThreadAdapter.restartMainThread(MainThreadAdapter.java:129)
>     ... {code}
> h2. Proposed Fix
> Add a null check for `zkDb` before calling `close()`, similar to how other 
> fields are handled in the same method:
> {code:java}
> // Current buggy code (line 1649-1653):
> try {
>     zkDb.close();
> } catch (IOException ie) {
>     LOG.warn("Error closing logs ", ie);
> }
> // Fixed code:
> if (zkDb != null) {
>     try {
>         zkDb.close();
>     } catch (IOException ie) {
>         LOG.warn("Error closing logs ", ie);
>     }
> } {code}
> I'm happy to send a PR for this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to