ConfX created HADOOP-18800:
------------------------------
Summary: Bad ipc.client.connection.idle-scan-interval.ms cause
resource leaks
Key: HADOOP-18800
URL: https://issues.apache.org/jira/browse/HADOOP-18800
Project: Hadoop Common
Issue Type: Bug
Components: common, conf, ipc
Reporter: ConfX
Attachments: reproduce.sh
When setting ipc.client.connection.idle-scan-interval.ms to a bad value (e.g. a
negative value), Hadoop Server fails to schedule the idle connection scan task
and causes resource leaks.
h2. Buggy code:
{code:java}
private void scheduleIdleScanTask() {
...
TimerTask idleScanTask = new TimerTask(){
@Override
public void run() {
...
try {
closeIdle(false);
} finally {
// explicitly reschedule so next execution occurs relative
// to the end of this scan, not the beginning
scheduleIdleScanTask();
}
}
};
idleScanTimer.schedule(idleScanTask, idleScanInterval); // <---
idleScanInterval is a negative value
}
{code}
In schedule, the task will not be scheduled if the delay is negative, which
causes resource leaks due to unscheduled idleScanTask.
{code:java}
public void schedule(TimerTask task, long delay) {
if (delay < 0)
throw new IllegalArgumentException("Negative delay.");
sched(task, System.currentTimeMillis()+delay, 0); // <-- the task
will not be scheduled when delay is negative
}
{code}
h2. How to reproduce:
We can use the test org.apache.hadoop.ipc.TestIPC#testSocketLeak to check the
resource leaks.
(1) Set ipc.client.connection.idle-scan-interval.ms to -1;
(2) Run test org.apache.hadoop.ipc.TestIPC#testSocketLeak
(3) You will see the following message (note that the number of leaked
descriptors can vary from run to run):
{code}
java.lang.AssertionError: Leaked 142 file descriptors
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.assertTrue(Assert.java:42)
at org.apache.hadoop.ipc.TestIPC.testSocketLeak(TestIPC.java:1155)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
at
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.lang.Thread.run(Thread.java:829)
{code}
You can use the reproduce.sh in the attachment to easily reproduce the bug:
We are happy to provide a patch if this issue is confirmed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]