zwangsheng commented on PR #2536:
URL: https://github.com/apache/celeborn/pull/2536#issuecomment-2144174749

   Well, seems I misjudged the issue.
   
   First, this NPE occurs when the Celeborn Worker has a graceful shutdown and 
restart operation while spark applications are running. And after restart,  we 
observe from the dashboard that a celeborn worker has an incorrect number of 
running applications.
   
   And task on spark side, will throw RPC fail with NPE(for now, I realize that 
this error being thrown doesn't seem to have anything to do with this PR, the 
error was thrown by)
   
https://github.com/apache/celeborn/blob/e5f09ce4e06154c3cadd1ff13a9f304b672e9cdb/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/ChunkStreamManager.java#L197-L200


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to