morningman opened a new issue #6946:
URL: https://github.com/apache/incubator-doris/issues/6946


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Description
   
   ## Case
   
   In the load process, each tablet will have a memtable to save the incoming 
data,
   and if the data in a memtable is larger than 100MB, it will be flushed to 
disk as a `segment` file. And then
   a new memtable will be created to save the following data/
   
   Assume that this is a table with N buckets(tablets). So the max size of all 
memtables will be `N * 100MB`.
   If N is large, it will cost too much memory.
   
   So for memory limit purpose, when the size of all memtables reach a 
threshold(2GB as default), Doris will
   try to flush all current memtables to disk(even if their size are not reach 
100MB).
   
   So you will see that the memtable will be flushed when it's size reach 
`2GB/N`, which maybe much smaller
   than 100MB, resulting in too many small segment files.
   
   ## Solution
   
   When decide to flush memtable to reduce memory consumption, NOT to flush all 
memtable, but to flush part
   of them.
   For example, there are 50 tablets(with 50 memtables). The memory limit is 
1GB, so when each memtable reach
   20MB, the total size reach 1GB, and flush will occur.
   
   If I only flush 25 of 50 memtables, then next time when the total size reach 
1GB, there will be 25 memtables with
   size 10MB, and other 25 memtables with size 30MB. So I can flush those 
memtables with size 30MB, which is larger
   than 20MB.
   
   The main idea is to introduce some jitter during flush to ensure the small 
unevenness of each memtable, so as to ensure that flush will only be triggered 
when the memtable is large enough.
   
   In my test, loading a table with 48 buckets, mem limit 2G, in previous 
version, the average memtable size is 44MB,
   after modification, the average size is 82MB
   
   
   ### Use case
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to