This is an automated email from the ASF dual-hosted git repository.
nicholasjiang pushed a commit to branch branch-0.4
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn.git
The following commit(s) were added to refs/heads/branch-0.4 by this push:
new b562f024d [CELEBORN-1339] Mark connection as timedOut in
TransportClient.close
b562f024d is described below
commit b562f024d3e5270a50eb920733bff323cf9bd453
Author: Mridul Muralidharan <mridulatgmail.com>
AuthorDate: Wed Mar 20 08:50:14 2024 +0800
[CELEBORN-1339] Mark connection as timedOut in TransportClient.close
### What changes were proposed in this pull request?
Importing details from https://github.com/apache/spark/pull/43162:
--
This PR avoids a race condition where a connection which is in the process
of being closed could be returned by the TransportClientFactory only to be
immediately closed and cause errors upon use.
This race condition is rare and not easily triggered, but with the upcoming
changes to introduce SSL connection support, connection closing can take just a
slight bit longer and it's much easier to trigger this issue.
Looking at the history of the code I believe this was an oversight in
https://github.com/apache/spark/pull/9853.
--
### Why are the changes needed?
We are working towards adding TLS support, which is essentially based on
Spark 4.0 TLS support, and this is one of the fixes from there.
(I am yet to file the overall TLS support jira yet, but this is enabling
work).
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit tests
Closes #2400 from mridulm/add-SPARK-45375.
Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: SteNicholas <[email protected]>
(cherry picked from commit 21d5698a90014f880130ea487751663d06f11650)
Signed-off-by: SteNicholas <[email protected]>
---
.../org/apache/celeborn/common/network/client/TransportClient.java | 3 +++
1 file changed, 3 insertions(+)
diff --git
a/common/src/main/java/org/apache/celeborn/common/network/client/TransportClient.java
b/common/src/main/java/org/apache/celeborn/common/network/client/TransportClient.java
index 7054c2f93..5482bdf6f 100644
---
a/common/src/main/java/org/apache/celeborn/common/network/client/TransportClient.java
+++
b/common/src/main/java/org/apache/celeborn/common/network/client/TransportClient.java
@@ -317,6 +317,9 @@ public class TransportClient implements Closeable {
@Override
public void close() {
+ // Mark the connection as timed out, so we do not return a connection
that's being closed
+ // from the TransportClientFactory if closing takes some time (e.g. with
SSL)
+ this.timedOut = true;
// close is a local operation and should finish with milliseconds; timeout
just to be safe
channel.close().awaitUninterruptibly(10, TimeUnit.SECONDS);
}