[
https://issues.apache.org/jira/browse/ZOOKEEPER-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354802#comment-17354802
]
Francesco Nigro edited comment on ZOOKEEPER-4309 at 6/1/21, 7:20 AM:
---------------------------------------------------------------------
There are other parts in the zookeeper source code where a similar pattern is
used and executor::shutdown is called to save the leak to happen.
The leak happen while halting QuorumCnxManager: listenerHandlers task can be
already completed or are going to complete on halt, but the executorservice's
thread are still alive, and given that non-daemon threads are GC roots they
won't be garbage collected, although not reacheable.
An ExecutorService which non-daemon threads are still alive prevent a main
thread to complete.
As example of the existing code patterns that correctly shutdown the executor,
you can find
https://github.com/franz1981/zookeeper/blob/b4f9aab099880ba8ef08eaff697debe6cdeae057/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java#L456-L468
{code:java}
ExecutorService executor =
Executors.newFixedThreadPool(serverSockets.size());
CountDownLatch latch = new CountDownLatch(serverSockets.size());
serverSockets.forEach(serverSocket ->
executor.submit(new
LearnerCnxAcceptorHandler(serverSocket, latch)));
try {
latch.await();
} catch (InterruptedException ie) {
LOG.error("Interrupted while sleeping in
LearnerCnxAcceptor.", ie);
} finally {
closeSockets();
executor.shutdown();
{code}
executor::shutdown is used to prevent the just created executor to leak, as
explained above.
A sample program that shows this ExecutorService behaviour is:
{code:java}
private static final boolean STOP_EXECUTOR = false;
public static void main(String[] args) throws InterruptedException {
final CountDownLatch interruptibleTask = new CountDownLatch(1);
final CountDownLatch completed = new CountDownLatch(1);
final CountDownLatch starting = new CountDownLatch(1);
final ExecutorService service = Executors.newFixedThreadPool(1);
service.submit(() -> {
try {
starting.countDown();
interruptibleTask.await();
System.err.println("Graceful stop");
} catch (InterruptedException e) {
System.err.println("This cannot happen");
} finally {
completed.countDown();
}
});
if (STOP_EXECUTOR) {
service.shutdown();
}
starting.await();
System.err.println("Gracefully stop the submitted task");
interruptibleTask.countDown();
System.err.println("Await interruptible task to complete");
completed.await();
if (STOP_EXECUTOR) {
System.err.println("The application can correctly complete and
there is no leak");
} else {
System.err.println("Perform some full GC and take an heap dump:
you'll se the service Thread still alive");
System.err.println("And this application won't stop");
}
}
{code}
Setting STOP_EXECUTOR = false shows that the application never end because of
the Thread leak, while STOP_EXECUTOR = true just make it works, and the
application can exit with code 0.
Hope that's enough to explain what's the issue here.
was (Author: nigrofranz):
There are other parts in the zookeeper source code where a similar pattern is
used and executor::shutdown is called to save the leak to happen.
The leak happen while halting QuorumCnxManager: listenerHandlers task can be
already completed or are going to complete on halt, but the executorservice's
thread are still alive, and given that non-daemon threads are GC roots they
won't be garbage collected, although not reacheable.
An ExecutorService which non-daemon threads are still alive prevent a main
thread to complete.
An example of the existing code patterns that correctly shutdown the executor,
you can find
https://github.com/franz1981/zookeeper/blob/b4f9aab099880ba8ef08eaff697debe6cdeae057/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java#L456-L468
{code:java}
ExecutorService executor =
Executors.newFixedThreadPool(serverSockets.size());
CountDownLatch latch = new CountDownLatch(serverSockets.size());
serverSockets.forEach(serverSocket ->
executor.submit(new
LearnerCnxAcceptorHandler(serverSocket, latch)));
try {
latch.await();
} catch (InterruptedException ie) {
LOG.error("Interrupted while sleeping in
LearnerCnxAcceptor.", ie);
} finally {
closeSockets();
executor.shutdown();
{code}
executor::shutdown is used to prevent the just created executor to leak, as
explained above.
A sample program that shows this ExecutorService behaviour is:
{code:java}
private static final boolean STOP_EXECUTOR = false;
public static void main(String[] args) throws InterruptedException {
final CountDownLatch interruptibleTask = new CountDownLatch(1);
final CountDownLatch completed = new CountDownLatch(1);
final CountDownLatch starting = new CountDownLatch(1);
final ExecutorService service = Executors.newFixedThreadPool(1);
service.submit(() -> {
try {
starting.countDown();
interruptibleTask.await();
System.err.println("Graceful stop");
} catch (InterruptedException e) {
System.err.println("This cannot happen");
} finally {
completed.countDown();
}
});
if (STOP_EXECUTOR) {
service.shutdown();
}
starting.await();
System.err.println("Gracefully stop the submitted task");
interruptibleTask.countDown();
System.err.println("Await interruptible task to complete");
completed.await();
if (STOP_EXECUTOR) {
System.err.println("The application can correctly complete and
there is no leak");
} else {
System.err.println("Perform some full GC and take an heap dump:
you'll se the service Thread still alive");
System.err.println("And this application won't stop");
}
}
{code}
Setting STOP_EXECUTOR = false shows that the application never end because of
the Thread leak, while STOP_EXECUTOR = true just make it works, and the
application can exit with code 0.
Hope that's enough to explain what's the issue here.
> QuorumCnxManager's ListenerHandler thread leak
> ----------------------------------------------
>
> Key: ZOOKEEPER-4309
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4309
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.6.3, 3.7.0
> Reporter: Francesco Nigro
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 50m
> Remaining Estimate: 0h
>
> QuorumCnxManager::Listener::run is creating a
> Executors.newFixedThreadPool(addresses.size()) without shutting it down after
> ListenerHandler task has been completed causing it to leak.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)