xianjingfeng commented on PR #1718:
URL: 
https://github.com/apache/incubator-uniffle/pull/1718#issuecomment-2151807053

   > > This pr is for reducing the amount of data written to disk.
   > 
   > When the server's memory is insufficient, it always has to flush to the 
disk. If the server's memory is not increased, whether it is flushing part of 
the data to the disk with this PR, or the previous flushing of a complete 
ShuffleBuffer to the disk, at the end of the task, the total amount of data 
that the server flushes to the disk should be similar, right? Is it possible 
that if I flush less each time, it will reduce the amount of data written to 
the disk? To be more extreme, if I have to write 100TB of shuffle data to the 
shuffle server, and the server only has 300GB of memory, ultimately over 99TB 
of data has to be written to the disk, regardless of whether this PR is used or 
not.
   > 
   > So, I don't understand what you mean by saying `reducing the amount of 
data written to disk`.
   
   In our production environment, the maximum disk usage is about 5T per node , 
but we have 1.4T of memory per node. If a shuffle occupancy a lot of memory, 
but it will be completed quickly, there is no need to flush all the data to 
disk in this case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to