xianjingfeng commented on PR #1718: URL: https://github.com/apache/incubator-uniffle/pull/1718#issuecomment-2151807053
> > This pr is for reducing the amount of data written to disk. > > When the server's memory is insufficient, it always has to flush to the disk. If the server's memory is not increased, whether it is flushing part of the data to the disk with this PR, or the previous flushing of a complete ShuffleBuffer to the disk, at the end of the task, the total amount of data that the server flushes to the disk should be similar, right? Is it possible that if I flush less each time, it will reduce the amount of data written to the disk? To be more extreme, if I have to write 100TB of shuffle data to the shuffle server, and the server only has 300GB of memory, ultimately over 99TB of data has to be written to the disk, regardless of whether this PR is used or not. > > So, I don't understand what you mean by saying `reducing the amount of data written to disk`. In our production environment, the maximum disk usage is about 5T per node , but we have 1.4T of memory per node. If a shuffle occupancy a lot of memory, but it will be completed quickly, there is no need to flush all the data to disk in this case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
