This is an automated email from the ASF dual-hosted git repository.
ethanfeng pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/celeborn.git
The following commit(s) were added to refs/heads/main by this push:
new 88661c2c6 [CELEBORN-1992] Ensure hadoop FS are not closed by hadoop
ShutdownHookManager
88661c2c6 is described below
commit 88661c2c6934ff8695c85408cc02519060b534ef
Author: [email protected] <[email protected]>
AuthorDate: Wed May 7 11:28:44 2025 +0800
[CELEBORN-1992] Ensure hadoop FS are not closed by hadoop
ShutdownHookManager
### What changes were proposed in this pull request?
Ensure hadoop FS are not closed by hadoop ShutdownHookManager
### Why are the changes needed?
By default hadoop manage close of the hadoop FS
through[ShutdownHookManager](https://github.com/apache/hadoop/blob/b4466a3b0a772d53e948f3e440f3e8e285f12a26/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ShutdownHookManager.java)
This can leads to having the FS being closed before having all streams
being closed
This is leading to issue with S3 which try to perform some call from the s3
hadoop FS to generate index file
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Tested on a celeborn cluster installed on kubernetes
- launched a 10 TiB shuffle jobs
- restart some workers while the shuffle job is running
- the files are now well completed and we are not seeing anymore failure
on jobs when reading the shuffle data due to missing index files. Also on S3 we
do not see anymore some files not completed (data files at 0B)
Closes #3241 from ashangit/nfraison/CELEBORN-1992.
Authored-by: [email protected] <[email protected]>
Signed-off-by: mingji <[email protected]>
---
.../main/scala/org/apache/celeborn/common/util/CelebornHadoopUtils.scala | 1 +
1 file changed, 1 insertion(+)
diff --git
a/common/src/main/scala/org/apache/celeborn/common/util/CelebornHadoopUtils.scala
b/common/src/main/scala/org/apache/celeborn/common/util/CelebornHadoopUtils.scala
index 58fde690b..3cdb57b97 100644
---
a/common/src/main/scala/org/apache/celeborn/common/util/CelebornHadoopUtils.scala
+++
b/common/src/main/scala/org/apache/celeborn/common/util/CelebornHadoopUtils.scala
@@ -34,6 +34,7 @@ object CelebornHadoopUtils extends Logging {
private var logPrinted = new AtomicBoolean(false)
private[celeborn] def newConfiguration(conf: CelebornConf): Configuration = {
val hadoopConf = new Configuration()
+ hadoopConf.set("fs.automatic.close", "false")
if (conf.hdfsDir.nonEmpty) {
val path = new Path(conf.hdfsDir)
val scheme = path.toUri.getScheme