This is an automated email from the ASF dual-hosted git repository.

rexxiong pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/celeborn.git


The following commit(s) were added to refs/heads/main by this push:
     new f25972d00 [CELEBORN-1317][FOLLOWUP] HttpServer avoid Jetty's acceptor 
thread shrink for stopping
f25972d00 is described below

commit f25972d0037dcd479af18118fa438172f1ed5ed5
Author: SteNicholas <[email protected]>
AuthorDate: Tue Apr 9 16:58:22 2024 +0800

    [CELEBORN-1317][FOLLOWUP] HttpServer avoid Jetty's acceptor thread shrink 
for stopping
    
    ### What changes were proposed in this pull request?
    
    `HttpServer` set idle timeout to 0 in order to avoid Jetty's acceptor 
thread shrink for stopping.
    
    ### Why are the changes needed?
    
    When the Jetty's acceptor thread shrinks before the main thread sends a 
signal to the thread, the issue `java.io.IOException: No such file or 
directory` could happen.
    
    Jetty's acceptor thread waits for a new connection request and blocked by 
`accept(this.fd, newfd, isaa)` in 
[sun.nio.ch.ServerSocketChannelImpl#accept](http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/sun/nio/ch/ServerSocketChannelImpl.java#l241).
    
    When `org.eclipse.jetty.server.Server.doStop` is called in the main thread, 
the thread reaches [this 
code](http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/sun/nio/ch/ServerSocketChannelImpl.java#l280).
    
    The server socket descriptor will be closed by `nd.preClose` in the main 
thread.
    Then, `accept()` in acceptor thread throws an Exception due to "Bad file 
descriptor" in case of macOS.
    After the exception is thrown, the acceptor thread will continue to [fetch 
a 
task](https://github.com/eclipse/jetty.project/blob/jetty-9.4.18.v20190429/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java#L783).
    If the thread obtain the `SHRINK` task 
[here](https://github.com/eclipse/jetty.project/blob/jetty-9.4.18.v20190429/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java#L854),
 the thread will be shrink.
    If, the acceptor thread finishes before `NativeThread.signal` is called in 
the main thread, this issue happens.
    
    Environment:
    
    - Jetty: v9.4.52.v20230823
    - JDK: Oracle JDK 1.8
    - OS: Linux version 5.10.0 (gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 
GNU ld (GNU Binutils for Debian) 2.35.2)
    
    Backport https://github.com/apache/spark/pull/28437.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    No.
    
    Closes #2450 from SteNicholas/CELEBORN-1317.
    
    Authored-by: SteNicholas <[email protected]>
    Signed-off-by: Shuang <[email protected]>
---
 .../org/apache/celeborn/server/common/http/HttpServer.scala | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git 
a/service/src/main/scala/org/apache/celeborn/server/common/http/HttpServer.scala
 
b/service/src/main/scala/org/apache/celeborn/server/common/http/HttpServer.scala
index 45d63c6b7..8592d696c 100644
--- 
a/service/src/main/scala/org/apache/celeborn/server/common/http/HttpServer.scala
+++ 
b/service/src/main/scala/org/apache/celeborn/server/common/http/HttpServer.scala
@@ -17,6 +17,8 @@
 
 package org.apache.celeborn.server.common.http
 
+import scala.util.Try
+
 import org.apache.commons.lang3.SystemUtils
 import org.eclipse.jetty.server.{Handler, HttpConfiguration, 
HttpConnectionFactory, Server, ServerConnector}
 import org.eclipse.jetty.server.handler.{ContextHandlerCollection, 
ErrorHandler}
@@ -60,11 +62,20 @@ private[celeborn] case class HttpServer(
       server.setStopTimeout(0)
       connector.setStopTimeout(0)
     }
+    val threadPool = server.getThreadPool
+    threadPool match {
+      case pool: QueuedThreadPool =>
+        // avoid Jetty's acceptor thread shrink
+        Try(pool.setIdleTimeout(0))
+      case _ =>
+    }
     logInfo(s"$role: Stopping HttpServer")
     server.stop()
     server.join()
     connector.stop()
-    server.getThreadPool match {
+    // Stop the ThreadPool if it supports stop() method (through LifeCycle).
+    // It is needed because stopping the Server won't stop the ThreadPool it 
uses.
+    threadPool match {
       case lifeCycle: LifeCycle => lifeCycle.stop()
       case _ =>
     }

Reply via email to