[
https://issues.apache.org/jira/browse/IOTDB-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714831#comment-17714831
]
Tian Jiang commented on IOTDB-5792:
-----------------------------------
The design and evaluation is available on:
[https://apache-iotdb.feishu.cn/docx/TAu9dKrFioYQxmxhgUVcVrmmnWf]
Please feel free to leave any comments.
> Parallel encoding in MemTable flush
> -----------------------------------
>
> Key: IOTDB-5792
> URL: https://issues.apache.org/jira/browse/IOTDB-5792
> Project: Apache IoTDB
> Issue Type: Improvement
> Components: Core/Engine
> Reporter: Tian Jiang
> Assignee: Tian Jiang
> Priority: Major
> Labels: encoding, flush, pull-request-available
> Fix For: master branch
>
>
> Currently, there is only one encoding task for each MemTable flushing task.
> In other words, the encoding during flushing a MemTable is fully serialized.
> Thus, when the size of MemTable is large, the encoding will be considerably
> time-consuming. This is especially true when the computing power of a single
> core is low, which is common for commercial servers with many cores.
> In one of my experiments, there are 1M time series (datatype = double) in a
> MemTable, and the avg point number of each series is around 300, making the
> total size of the MemTable about 5GB. The time of encoding such a MemTable
> is, incredibly, over 100s. The system easily into a reject status because the
> flushing is so slow.
> Since the encoding process is naturally parallelizable (it is a purely
> in-memory operation with perfect locality), I would like to propose replacing
> the single-threaded encoding process with multiple threads.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)