xianjingfeng commented on PR #1681:
URL: 
https://github.com/apache/incubator-uniffle/pull/1681#issuecomment-2100027407

   > This has been ensured by the driver + executor side(read+write rpc) 
heartbeat. If you think heartbeat is too loosy, the longer heartbeat timeout 
should be better.
   
   1.If the client is reading the data in HDFS, executors will not send 
heartbeats to shuffle servers. 
   2.We currently solve the problem by setting 
`spark.rss.client.heartBeat.threadNum`, but this method is unreliable. We can't 
guarantee that the applications will not be expired.
   3.And the longer heartbeat timeout means the expired data in local disk will 
be kept longer.
   4.And we currently use multiple replicas, the data written by the expired 
shuffle server is still useful.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to