Sicheng Yu created IOTDB-6267:
---------------------------------
Summary: Load 2.0
Key: IOTDB-6267
URL: https://issues.apache.org/jira/browse/IOTDB-6267
Project: Apache IoTDB
Issue Type: Improvement
Reporter: Sicheng Yu
Assignee: Sicheng Yu
- MPP Load has some problems with stability and compatibility with the Pipe
system, and there is room for optimization in loading speed.
Issue 1: Uncontrollable upper limit of total memory used when multiple Load
statements are executed concurrently.
- Currently, Load only strictly controls the upper limit of memory used by a
single Load statement during its execution life cycle.
- When a large number of Load statements are executed concurrently, the total
memory size used by these Load statements is uncontrollable.
- Please refer to MPP Load memory footprint for the memory usage during the
execution life cycle of a single Load statement.
Issue 2: New data added by Load is not properly recognized by the Pipe system.
- The Pipe system currently adds a ProgressIndex to all new data added to the
IoTDB (see the discussion of Key Issues in Pipe System Design and
Implementation).
- In a normal write process, the process of adding the index is realized by the
consensus layer.
- In the normal write process, the process of adding an identifier is
implemented by the consensus layer. However, the current Load's two-phase
transaction commit process does not go through the consensus layer, and does
not have a normal progress identifier, nor can it be correctly recognized by
the Pipe system when restarting the task.
Issue 3: Too many serialization steps in the Load TsFile process.
- In the LoadTsFileScheduler class, the implementation of MPP Load 1.0 is to
- Iterate through each TsFile in the TsFile.
- For each TsFile, perform split first, then send, and then perform the
second stage after all the sends are completed.
- After completing the second phase, the next TsFile is loaded sequentially.
- Since the TsFile splitting process may involve memory computation, the disk
IO capacity is not fully utilized during the memory computation.
- A single TsFile is serialized during splitting and sending via Thrift,
waiting for both disk IO and network IO.
document link:https://apache-iotdb.feishu.cn/docx/UE9Od5caDoLoYJxt4Ptc4s0hnof
--
This message was sent by Atlassian Jira
(v8.20.10#820010)