Re: [PR] [#1717] improvement: Pick partitions instead of shuffles for flushing [incubator-uniffle]

via GitHub Sat, 18 May 2024 07:17:43 -0700


xianjingfeng commented on PR #1718:
URL: 
https://github.com/apache/incubator-uniffle/pull/1718#issuecomment-2118838792


   > If the range of [low-watermark, high-watermark] is large, this problem 
with this patch should still exist, right?
   
   You are right, but if the range of [low-watermark, high-watermark] is not 
large and the size of a shuffle is large, this patch will work.
   
   > +1. I'm not sure how your strategy to choose partitions over shuffles will 
improvement performance: reduce gc or disk access.
   
   The following configuration is that i used when testing:
   `rss.server.buffer.capacity=400g`
   `rss.server.memory.shuffle.lowWaterMark.percentage=75`
   `rss.server.memory.shuffle.highWaterMark.percentage=85`
   
   My goal is to make full use of the memory, so i hope that it only writes 40g 
to the disk when the memory reaches 340g. But actually it writes 340g. When a 
large number of buffers are written to the disk, a large number of objects will 
be generated, and GC will be very frequent. Moreover, when hundreds of GB of 
data are written to the disk at the same time, the system load will be very 
high, further reducing the speed of GC.
   
   @zuston @advancedxy 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [#1717] improvement: Pick partitions instead of shuffles for flushing [incubator-uniffle]

Reply via email to