TheodoreLx commented on PR #3554: URL: https://github.com/apache/celeborn/pull/3554#issuecomment-3630598517
<img width="1822" height="507" alt="image" src="https://github.com/user-attachments/assets/280f96fc-d313-4d0d-9df0-97229b27ae30" /> This is a screenshot of the driver logs from a previous task. If the sum of the `queueTime` and `processTime` of an RPC message exceeds 1 second, it will be displayed here. In this image, because many RPC messages were waiting in the queue, the `CheckExistence` message was queued for 12 seconds before being processed. In some large jobs, this could even exceed 30 seconds. This is precisely the timeout period of `rpcLookupTimeout`, which would cause the task to fail. After adopting this pull request, we no longer found `slow rpc detected` information for CheckExistence in the driver logs, meaning that the sum of `queueTime` and `processTime` has decreased to below 1 second. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
