ConfX created HADOOP-18800: ------------------------------ Summary: Bad ipc.client.connection.idle-scan-interval.ms cause resource leaks Key: HADOOP-18800 URL: https://issues.apache.org/jira/browse/HADOOP-18800 Project: Hadoop Common Issue Type: Bug Components: common, conf, ipc Reporter: ConfX Attachments: reproduce.sh
When setting ipc.client.connection.idle-scan-interval.ms to a bad value (e.g. a negative value), Hadoop Server fails to schedule the idle connection scan task and causes resource leaks. h2. Buggy code: {code:java} private void scheduleIdleScanTask() { ... TimerTask idleScanTask = new TimerTask(){ @Override public void run() { ... try { closeIdle(false); } finally { // explicitly reschedule so next execution occurs relative // to the end of this scan, not the beginning scheduleIdleScanTask(); } } }; idleScanTimer.schedule(idleScanTask, idleScanInterval); // <--- idleScanInterval is a negative value } {code} In schedule, the task will not be scheduled if the delay is negative, which causes resource leaks due to unscheduled idleScanTask. {code:java} public void schedule(TimerTask task, long delay) { if (delay < 0) throw new IllegalArgumentException("Negative delay."); sched(task, System.currentTimeMillis()+delay, 0); // <-- the task will not be scheduled when delay is negative } {code} h2. How to reproduce: We can use the test org.apache.hadoop.ipc.TestIPC#testSocketLeak to check the resource leaks. (1) Set ipc.client.connection.idle-scan-interval.ms to -1; (2) Run test org.apache.hadoop.ipc.TestIPC#testSocketLeak (3) You will see the following message (note that the number of leaked descriptors can vary from run to run): {code} java.lang.AssertionError: Leaked 142 file descriptors at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.assertTrue(Assert.java:42) at org.apache.hadoop.ipc.TestIPC.testSocketLeak(TestIPC.java:1155) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.lang.Thread.run(Thread.java:829) {code} You can use the reproduce.sh in the attachment to easily reproduce the bug: We are happy to provide a patch if this issue is confirmed. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org