Tian Jiang created IOTDB-5792:
---------------------------------

             Summary: Parallel encoding in MemTable flush
                 Key: IOTDB-5792
                 URL: https://issues.apache.org/jira/browse/IOTDB-5792
             Project: Apache IoTDB
          Issue Type: Improvement
          Components: Core/Engine
            Reporter: Tian Jiang
             Fix For: master branch


Currently, there is only one encoding task for each MemTable flushing task. In 
other words, the encoding during flushing a MemTable is fully serialized. Thus, 
when the size of MemTable is large, the encoding will be considerably 
time-consuming. This is especially true when the computing power of a single 
core is low, which is common for commercial servers with many cores.

In one of my experiments, there are 1M time series (datatype = double) in a 
MemTable, and the avg point number of each series is around 300, making the 
total size of the MemTable about 5GB. The time of encoding such a MemTable is, 
incredibly, over 100s. The system easily into a reject status because the 
flushing is so slow.

Since the encoding process is naturally parallelizable (it is a purely 
in-memory operation with perfect locality), I would like to propose replacing 
the single-threaded encoding process with multiple threads.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to