leixm commented on issue #198: URL: https://github.com/apache/incubator-uniffle/issues/198#issuecomment-1246378735
> > > > Response may not be sent by shuffle server timely. > > > > > > > > > Do we have some ways to avoid this problem? > > > > > > This is the response timeout, which should be caused by the high load of ShuffleServer and slow response. > > But I dont find any exception through the metrics of grpc. How did u locate this problem caused by the high-pressure shuffle server? We had a similar rpc timeout exception before, and the task of running 10T data will appear. The investigation found that it was because the inflush_memory and used_memory were too high, which caused the client to frequently retry to send data and apply for buffer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@uniffle.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org