weizuo93 opened a new issue #4797:
URL: https://github.com/apache/incubator-doris/issues/4797
MemTable flush can be activated by three situations:
A. The size of MemTable reach to the flush threshold
`config::write_buffer_size`;
B. Data load is finished and DeltaWriter needs to be closed;
C. Memory consumption exceeds the limit and MemTable needs to flush to
reduce memory usage.
A large number of small segment files would be generated due to "situation
B" above when small batch data (far less than `config::write_buffer_size`) are
loaded frequently , which will lead to lower efficiency for scan operations.
We can optimize MemTable flush mechanism like this:
(1) Maintain a `vector<MemTable>` for each tablet;
(2) When close DeltaWriter and flush MemTable for a tablet, do not reset the
Memtable if there is no flush operation before for this tablet in the data load
and push the MemTable into `vector<MemTable>`;
(3) When next flush operation for this tablet comming, judge whether the
flush is activated by :
a. "situation A": Merge all MemTable in `vector<MemTable>` into
current MemTable , flush the merged MemTable , delete all the rowset
corresponding to MemTable in `vector<MemTable>` and clear `vector<MemTable>`;
b. "situation B": If the total size of MemTable in
`vector<MemTable>`and current MemTable reach threshold
`config::write_buffer_size`, merge all MemTable in `vector<MemTable>` into
current MemTable, flush the merged MemTable, delete all the rowset
corresponding to MemTable in `vector<MemTable>` and clear `vector<MemTable>`;
If the total size of MemTable in `vector<MemTable>`and current MemTable is
less than the threshold `config::write_buffer_size`, push the MemTable into
`vector<MemTable>` and only flush the current MemTable;
c. "situation C": flush the current MemTable 、reset all the MemTable
in `vector<MemTable>` and clear `vector<MemTable>`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]