Re: [PR] [#1717] improvement: Pick partitions instead of shuffles for flushing [incubator-uniffle]

via GitHub Sun, 19 May 2024 19:57:43 -0700


xianjingfeng commented on PR #1718:
URL: 
https://github.com/apache/incubator-uniffle/pull/1718#issuecomment-2119575106


   > > My goal is to make full use of the memory, so i hope that it only writes 
40g to the disk when the memory reaches 340g. But actually it writes 340g.
   > 
   > So, the reason one flush writes ~340GB shuffle data is that there's one 
large shuffle(from one specific app) that occupies ~340GB memory data?
   > 
   > This is a valid use case and we should improve that. However I'm a bit of 
worried the implications of the new approach:
   > 
   > 1. it might be quite expensive to maintain the topN buffers as it might be 
a lot of buffers from all the shuffles in one shuffle server
   > 2. unbalance access for partition data in shuffle server memory and on 
disk/hdfs: parts of data in memory and parts of data flushed for almost every 
shuffles.
   > 
   > How about we make incremental improvement to this:
   > 
   > 1. picks up the shuffles to flush
   > 2. if the flush size is way larger than (high_watermark - low_watermark) * 
buffer_capacity, picks the top N buffers in the picked shuffles(which should be 
one, or two).
   
   ok for me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [#1717] improvement: Pick partitions instead of shuffles for flushing [incubator-uniffle]

Reply via email to