[ 
https://issues.apache.org/jira/browse/IOTDB-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714831#comment-17714831
 ] 

Tian Jiang commented on IOTDB-5792:
-----------------------------------

The design and evaluation is available on:

[https://apache-iotdb.feishu.cn/docx/TAu9dKrFioYQxmxhgUVcVrmmnWf]

Please feel free to leave any comments.

> Parallel encoding in MemTable flush
> -----------------------------------
>
>                 Key: IOTDB-5792
>                 URL: https://issues.apache.org/jira/browse/IOTDB-5792
>             Project: Apache IoTDB
>          Issue Type: Improvement
>          Components: Core/Engine
>            Reporter: Tian Jiang
>            Assignee: Tian Jiang
>            Priority: Major
>              Labels: encoding, flush, pull-request-available
>             Fix For: master branch
>
>
> Currently, there is only one encoding task for each MemTable flushing task. 
> In other words, the encoding during flushing a MemTable is fully serialized. 
> Thus, when the size of MemTable is large, the encoding will be considerably 
> time-consuming. This is especially true when the computing power of a single 
> core is low, which is common for commercial servers with many cores.
> In one of my experiments, there are 1M time series (datatype = double) in a 
> MemTable, and the avg point number of each series is around 300, making the 
> total size of the MemTable about 5GB. The time of encoding such a MemTable 
> is, incredibly, over 100s. The system easily into a reject status because the 
> flushing is so slow.
> Since the encoding process is naturally parallelizable (it is a purely 
> in-memory operation with perfect locality), I would like to propose replacing 
> the single-threaded encoding process with multiple threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to