SteNicholas opened a new pull request, #2084:
URL: https://github.com/apache/incubator-celeborn/pull/2084

   ### What changes were proposed in this pull request?
   
   `ShuffleClientImpl` closes `batchReviveRequestScheduler` of `ReviveManager`.
   
   ### Why are the changes needed?
   
   After shuffle client is closed, `ReviveManager` still schedules invoker to 
`ShuffleClientImpl#reviveBatch`, which causes the `NullPointerException`. 
Therefore, `ShuffleClientImpl` should close `batchReviveRequestScheduler` of 
`ReviveManager` to avoid `NullPointerException`.
   
   ```
   23/11/08 18:09:25,819 [batch-revive-scheduler] ERROR ShuffleClientImpl: 
Exception raised while reviving for shuffle 0 partitionIds 1988, epochs 0,.
   java.lang.NullPointerException
        at 
org.apache.celeborn.client.ShuffleClientImpl.reviveBatch(ShuffleClientImpl.java:705)
        at 
org.apache.celeborn.client.ReviveManager.lambda$new$1(ReviveManager.java:94)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   23/11/08 18:09:25,844 [celeborn-retry-sender-6] ERROR ShuffleClientImpl: 
Push data to xx.xx.xx.xx:9092 failed for shuffle 0 map 216 attempt 0 partition 
1988 batch 2623, remain revive times 4.
   org.apache.celeborn.common.exception.CelebornIOException: 
PUSH_DATA_CONNECTION_EXCEPTION_PRIMARY then revive but REVIVE_FAILED, revive 
status 12(REVIVE_FAILED), old location: PartitionLocation[
     id-epoch:1988-0
     
host-rpcPort-pushPort-fetchPort-replicatePort:xx.xx.xx.xx-9091-9092-9093-9094
     mode:PRIMARY
     peer:(empty)
     storage hint:StorageInfo{type=MEMORY, mountPoint='/tmp/storage', 
finalResult=false, filePath=}
     mapIdBitMap:null]
        at 
org.apache.celeborn.client.ShuffleClientImpl.submitRetryPushData(ShuffleClientImpl.java:261)
        at 
org.apache.celeborn.client.ShuffleClientImpl.access$600(ShuffleClientImpl.java:62)
        at 
org.apache.celeborn.client.ShuffleClientImpl$3.lambda$onFailure$1(ShuffleClientImpl.java:1045)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748) 
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to