This is an automated email from the ASF dual-hosted git repository.

nicholasjiang pushed a commit to branch branch-0.4
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn.git


The following commit(s) were added to refs/heads/branch-0.4 by this push:
     new b562f024d [CELEBORN-1339] Mark connection as timedOut in 
TransportClient.close
b562f024d is described below

commit b562f024d3e5270a50eb920733bff323cf9bd453
Author: Mridul Muralidharan <mridulatgmail.com>
AuthorDate: Wed Mar 20 08:50:14 2024 +0800

    [CELEBORN-1339] Mark connection as timedOut in TransportClient.close
    
    ### What changes were proposed in this pull request?
    
    Importing details from https://github.com/apache/spark/pull/43162:
    
    --
    This PR avoids a race condition where a connection which is in the process 
of being closed could be returned by the TransportClientFactory only to be 
immediately closed and cause errors upon use.
    
    This race condition is rare and not easily triggered, but with the upcoming 
changes to introduce SSL connection support, connection closing can take just a 
slight bit longer and it's much easier to trigger this issue.
    
    Looking at the history of the code I believe this was an oversight in 
https://github.com/apache/spark/pull/9853.
    
    --
    
    ### Why are the changes needed?
    
    We are working towards adding TLS support, which is essentially based on 
Spark 4.0 TLS support, and this is one of the fixes from there.
    (I am yet to file the overall TLS support jira yet, but this is enabling 
work).
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    Unit tests
    
    Closes #2400 from mridulm/add-SPARK-45375.
    
    Authored-by: Mridul Muralidharan <mridulatgmail.com>
    Signed-off-by: SteNicholas <[email protected]>
    (cherry picked from commit 21d5698a90014f880130ea487751663d06f11650)
    Signed-off-by: SteNicholas <[email protected]>
---
 .../org/apache/celeborn/common/network/client/TransportClient.java     | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/common/src/main/java/org/apache/celeborn/common/network/client/TransportClient.java
 
b/common/src/main/java/org/apache/celeborn/common/network/client/TransportClient.java
index 7054c2f93..5482bdf6f 100644
--- 
a/common/src/main/java/org/apache/celeborn/common/network/client/TransportClient.java
+++ 
b/common/src/main/java/org/apache/celeborn/common/network/client/TransportClient.java
@@ -317,6 +317,9 @@ public class TransportClient implements Closeable {
 
   @Override
   public void close() {
+    // Mark the connection as timed out, so we do not return a connection 
that's being closed
+    // from the TransportClientFactory if closing takes some time (e.g. with 
SSL)
+    this.timedOut = true;
     // close is a local operation and should finish with milliseconds; timeout 
just to be safe
     channel.close().awaitUninterruptibly(10, TimeUnit.SECONDS);
   }

Reply via email to