This is an automated email from the ASF dual-hosted git repository.
rexxiong pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/celeborn.git
The following commit(s) were added to refs/heads/main by this push:
new f25972d00 [CELEBORN-1317][FOLLOWUP] HttpServer avoid Jetty's acceptor
thread shrink for stopping
f25972d00 is described below
commit f25972d0037dcd479af18118fa438172f1ed5ed5
Author: SteNicholas <[email protected]>
AuthorDate: Tue Apr 9 16:58:22 2024 +0800
[CELEBORN-1317][FOLLOWUP] HttpServer avoid Jetty's acceptor thread shrink
for stopping
### What changes were proposed in this pull request?
`HttpServer` set idle timeout to 0 in order to avoid Jetty's acceptor
thread shrink for stopping.
### Why are the changes needed?
When the Jetty's acceptor thread shrinks before the main thread sends a
signal to the thread, the issue `java.io.IOException: No such file or
directory` could happen.
Jetty's acceptor thread waits for a new connection request and blocked by
`accept(this.fd, newfd, isaa)` in
[sun.nio.ch.ServerSocketChannelImpl#accept](http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/sun/nio/ch/ServerSocketChannelImpl.java#l241).
When `org.eclipse.jetty.server.Server.doStop` is called in the main thread,
the thread reaches [this
code](http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/sun/nio/ch/ServerSocketChannelImpl.java#l280).
The server socket descriptor will be closed by `nd.preClose` in the main
thread.
Then, `accept()` in acceptor thread throws an Exception due to "Bad file
descriptor" in case of macOS.
After the exception is thrown, the acceptor thread will continue to [fetch
a
task](https://github.com/eclipse/jetty.project/blob/jetty-9.4.18.v20190429/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java#L783).
If the thread obtain the `SHRINK` task
[here](https://github.com/eclipse/jetty.project/blob/jetty-9.4.18.v20190429/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java#L854),
the thread will be shrink.
If, the acceptor thread finishes before `NativeThread.signal` is called in
the main thread, this issue happens.
Environment:
- Jetty: v9.4.52.v20230823
- JDK: Oracle JDK 1.8
- OS: Linux version 5.10.0 (gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516,
GNU ld (GNU Binutils for Debian) 2.35.2)
Backport https://github.com/apache/spark/pull/28437.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes #2450 from SteNicholas/CELEBORN-1317.
Authored-by: SteNicholas <[email protected]>
Signed-off-by: Shuang <[email protected]>
---
.../org/apache/celeborn/server/common/http/HttpServer.scala | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git
a/service/src/main/scala/org/apache/celeborn/server/common/http/HttpServer.scala
b/service/src/main/scala/org/apache/celeborn/server/common/http/HttpServer.scala
index 45d63c6b7..8592d696c 100644
---
a/service/src/main/scala/org/apache/celeborn/server/common/http/HttpServer.scala
+++
b/service/src/main/scala/org/apache/celeborn/server/common/http/HttpServer.scala
@@ -17,6 +17,8 @@
package org.apache.celeborn.server.common.http
+import scala.util.Try
+
import org.apache.commons.lang3.SystemUtils
import org.eclipse.jetty.server.{Handler, HttpConfiguration,
HttpConnectionFactory, Server, ServerConnector}
import org.eclipse.jetty.server.handler.{ContextHandlerCollection,
ErrorHandler}
@@ -60,11 +62,20 @@ private[celeborn] case class HttpServer(
server.setStopTimeout(0)
connector.setStopTimeout(0)
}
+ val threadPool = server.getThreadPool
+ threadPool match {
+ case pool: QueuedThreadPool =>
+ // avoid Jetty's acceptor thread shrink
+ Try(pool.setIdleTimeout(0))
+ case _ =>
+ }
logInfo(s"$role: Stopping HttpServer")
server.stop()
server.join()
connector.stop()
- server.getThreadPool match {
+ // Stop the ThreadPool if it supports stop() method (through LifeCycle).
+ // It is needed because stopping the Server won't stop the ThreadPool it
uses.
+ threadPool match {
case lifeCycle: LifeCycle => lifeCycle.stop()
case _ =>
}