imay commented on issue #2016: [Proposal] Limit the memory usage of Compaction URL: https://github.com/apache/incubator-doris/issues/2016#issuecomment-544143260 # Compaction ratio statistic > To estimate the amount of memory used by a Compaction, it is mainly to estimate the size of a row in memory. We can simply use the ratio of the size of a memtable in memory to the size of file it is written on disk as the compaction ratio. With this ratio, the size of the data file on the disk, and the number of rows in file, we can calculate the approximate occupancy of a single row of data in memory. > Why not estimate average row length by computing ratio of write bytes to number of rows when generating a rowset. And this value can be recorded in rowset meta. > ### Supported compaction within a version > Currently only Compaction with at least one version is supported. And if there are too many Segments in a single version, it still consumes a lot of memory. So we need to support compaction with a subset of segments with a single version. > It will be better to merge enough segments into a bigger segment when loading other than leaving it when compaction. 1. This is simple for compaction logic; 2. this is good for read. Even if we have 1000 segments for a rowset, it will lead to terrible read performance.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
