ashangit opened a new pull request, #3241:
URL: https://github.com/apache/celeborn/pull/3241

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     - Make sure the PR title start w/ a JIRA ticket, e.g. '[CELEBORN-XXXX] 
Your PR title ...'.
     - Be sure to keep the PR description updated to reflect all changes.
     - Please write your PR title to summarize what this PR proposes.
     - If possible, provide a concise example to reproduce the issue for a 
faster review.
   -->
   
   ### What changes were proposed in this pull request?
   
   Ensure hadoop FS are not closed by hadoop ShutdownHookManager
   
   ### Why are the changes needed?
   
   By default hadoop manage close of the hadoop FS through ad 
[ShutdownHookManager](https://github.com/apache/hadoop/blob/b4466a3b0a772d53e948f3e440f3e8e285f12a26/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ShutdownHookManager.java)
 
   
   This can leads to having the FS being closed before having all streams being 
closed
   
   This is leading to issue with S3 which try to perform some call from the s3 
hadoop FS to generate index file
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   - Tested on a celeborn cluster installed on kubernetes
     - launched a 10 TiB shuffle jobs
     - restart some workers while the shuffle job is running
     - the files are now well completed and we are not seeing anymore failure 
on jobs when reading the shuffle data due to missing index files. Also on S3 we 
do not see anymore some files not completed (data files at 0B)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to