leixm commented on issue #198:
URL: 
https://github.com/apache/incubator-uniffle/issues/198#issuecomment-1246378735

   > > > > Response may not be sent by shuffle server timely.
   > > > 
   > > > 
   > > > Do we have some ways to avoid this problem?
   > > 
   > > 
   > > This is the response timeout, which should be caused by the high load of 
ShuffleServer and slow response.
   > 
   > But I dont find any exception through the metrics of grpc. How did u 
locate this problem caused by the high-pressure shuffle server?
   
   We had a similar rpc timeout exception before, and the task of running 10T 
data will appear. The investigation found that it was because the 
inflush_memory and used_memory were too high, which caused the client to 
frequently retry to send data and apply for buffer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to