ConfX created HADOOP-18800:
------------------------------

             Summary: Bad ipc.client.connection.idle-scan-interval.ms cause 
resource leaks
                 Key: HADOOP-18800
                 URL: https://issues.apache.org/jira/browse/HADOOP-18800
             Project: Hadoop Common
          Issue Type: Bug
          Components: common, conf, ipc
            Reporter: ConfX
         Attachments: reproduce.sh

When setting ipc.client.connection.idle-scan-interval.ms to a bad value (e.g. a 
negative value), Hadoop Server fails to schedule the idle connection scan task 
and causes resource leaks.

h2. Buggy code:
{code:java}
private void scheduleIdleScanTask() {
  ...
  TimerTask idleScanTask = new TimerTask(){
    @Override
    public void run() {
      ...
      try {
        closeIdle(false);
      } finally {
        // explicitly reschedule so next execution occurs relative
        // to the end of this scan, not the beginning
        scheduleIdleScanTask();
      }
    }
  };
  idleScanTimer.schedule(idleScanTask, idleScanInterval);   // <--- 
idleScanInterval is a negative value
}
{code}

In schedule, the task will not be scheduled if the delay is negative, which 
causes resource leaks due to unscheduled idleScanTask.
{code:java}
public void schedule(TimerTask task, long delay) {
    if (delay < 0)
        throw new IllegalArgumentException("Negative delay.");
    sched(task, System.currentTimeMillis()+delay, 0);        // <-- the task 
will not be scheduled when delay is negative
}
{code}

h2. How to reproduce:
We can use the test org.apache.hadoop.ipc.TestIPC#testSocketLeak to check the 
resource leaks.
(1) Set ipc.client.connection.idle-scan-interval.ms to -1;
(2) Run test org.apache.hadoop.ipc.TestIPC#testSocketLeak
(3) You will see the following message (note that the number of leaked 
descriptors can vary from run to run):
{code}
java.lang.AssertionError: Leaked 142 file descriptors
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.apache.hadoop.ipc.TestIPC.testSocketLeak(TestIPC.java:1155)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
        at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.lang.Thread.run(Thread.java:829)
{code}

You can use the reproduce.sh in the attachment to easily reproduce the bug:

We are happy to provide a patch if this issue is confirmed. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to