imay commented on issue #2016: [Proposal] Limit the memory usage of Compaction
URL: 
https://github.com/apache/incubator-doris/issues/2016#issuecomment-544143260
 
 
   # Compaction ratio statistic
   > To estimate the amount of memory used by a Compaction, it is mainly to 
estimate the size of a row in memory. We can simply use the ratio of the size 
of a memtable in memory to the size of file it is written on disk as the 
compaction ratio. With this ratio, the size of the data file on the disk, and 
the number of rows in file, we can calculate the approximate occupancy of a 
single row of data in memory.
   > 
   
   Why not estimate average row length by computing ratio of write bytes to 
number of rows when generating a rowset. And this value can be recorded in 
rowset meta.
   
   > ### Supported compaction within a version
   > Currently only Compaction with at least one version is supported. And if 
there are too many Segments in a single version, it still consumes a lot of 
memory. So we need to support compaction with a subset of segments with a 
single version.
   > 
   
   It will be better to merge enough segments into a bigger segment when 
loading other than leaving it when compaction. 1. This is simple for compaction 
logic; 2. this is good for read. Even if we have 1000 segments for a rowset, it 
will lead to terrible read performance.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to