This is an automated email from the ASF dual-hosted git repository.
ethanfeng pushed a commit to branch branch-0.5
in repository https://gitbox.apache.org/repos/asf/celeborn.git
The following commit(s) were added to refs/heads/branch-0.5 by this push:
new 90e0f7b47 [CELEBORN-1669] Fix NullPointerException for
PartitionFilesSorter#updateSortedShuffleFiles after cleaning up expired shuffle
key
90e0f7b47 is described below
commit 90e0f7b470f07cc6813ddf7d227557e4cf69f8be
Author: SteNicholas <[email protected]>
AuthorDate: Thu Oct 24 17:47:25 2024 +0800
[CELEBORN-1669] Fix NullPointerException for
PartitionFilesSorter#updateSortedShuffleFiles after cleaning up expired shuffle
key
### What changes were proposed in this pull request?
Fix `NullPointerException` for
`PartitionFilesSorter#updateSortedShuffleFiles` after cleaning up expired
shuffle key.
### Why are the changes needed?
`PartitionFilesSorter` sorts shuffle files in `worker-file-sorter-executor`
thread and cleans up expired key in `worker-expired-shuffle-cleaner` thread.
There is a case that after `worker-expired-shuffle-cleaner` cleaning up expired
shuffle key, `worker-file-sorter-executor` updates sorted shuffle files, which
causes `NullPointerException` at present.
```
2024-10-23 17:26:17,162 [INFO] [worker-expired-shuffle-cleaner] -
org.apache.celeborn.service.deploy.worker.Worker -Logging.scala(51) -Cleaned up
expired shuffle application_1724141892576_3843182_1-0
2024-10-23 17:26:17,392 [ERROR] [worker-file-sorter-executor-237572] -
org.apache.celeborn.service.deploy.worker.storage.PartitionFilesSorter
-PartitionFilesSorter.java(752) -Sorting shuffle file for
application_1724141892576_3843182_1-0-1875-0-0
/mnt/storage02/celeborn-worker/shuffle_data/application_1724141892576_3843182_1/0/1875-0-0
failed, detail:
java.lang.NullPointerException: null
at
org.apache.celeborn.service.deploy.worker.storage.PartitionFilesSorter.updateSortedShuffleFiles(PartitionFilesSorter.java:455)
~[celeborn-worker_2.12-0.5.0-SNAPSHOT.jar:0.5.0-SNAPSHOT]
at
org.apache.celeborn.service.deploy.worker.storage.PartitionFilesSorter$FileSorter.sort(PartitionFilesSorter.java:747)
~[celeborn-worker_2.12-0.5.0-SNAPSHOT.jar:0.5.0-SNAPSHOT]
at
org.apache.celeborn.service.deploy.worker.storage.PartitionFilesSorter.lambda$new$1(PartitionFilesSorter.java:164)
~[celeborn-worker_2.12-0.5.0-SNAPSHOT.jar:0.5.0-SNAPSHOT]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[?:1.8.0_162]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[?:1.8.0_162]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[?:1.8.0_162]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
~[?:1.8.0_162]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_162]
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
GA.
Closes #2847 from SteNicholas/CELEBORN-1669.
Authored-by: SteNicholas <[email protected]>
Signed-off-by: mingji <[email protected]>
(cherry picked from commit 4b150be0c89bd47ac2c79b65638142999a5cd949)
Signed-off-by: mingji <[email protected]>
---
.../celeborn/service/deploy/worker/storage/PartitionFilesSorter.java | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git
a/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/PartitionFilesSorter.java
b/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/PartitionFilesSorter.java
index e2da39fe2..21f2a8817 100644
---
a/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/PartitionFilesSorter.java
+++
b/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/PartitionFilesSorter.java
@@ -457,7 +457,10 @@ public class PartitionFilesSorter extends
ShuffleRecoverHelper {
@VisibleForTesting
public void updateSortedShuffleFiles(String shuffleKey, String fileId, long
fileLength) {
- sortedShuffleFiles.get(shuffleKey).add(fileId);
+ Set<String> shuffleFiles = sortedShuffleFiles.get(shuffleKey);
+ if (shuffleFiles != null) {
+ shuffleFiles.add(fileId);
+ }
sortedFileCount.incrementAndGet();
sortedFilesSize.addAndGet(fileLength);
}