Hi, I am solving the long-tail latency problem. There are some cases (blocking points) that blocking the insertion. For a better understanding of this problem, I first introduce the writing process of IoTDB:
IoTDB maintains several independent engines (storage group) that supports read and write. In the following, we focus on one engine. A engine maintains several closed data files and one unclosed data file that receives appended data. In memory, there is only one working memtable (m1) that receives writes. There is also another memtable (m2) that will take place m1 when m1 is full and being flushed. When a data item is inserted: (1)We insert it into the working memtable. (2)We check the size of the memtable. If it reaches a threshold, we submit a flush task “after the previous flush task is finished” and switch the two memtables. (3)We check the size of the unclosed file. If it reaches a threshold, we close it “after the previous flush task is finished”. In the above steps, all the "after the previous flush task is finished" will block the insertion process. One solution is to make all flush and close task asynchronous. Some questions need to carefully considered: (1) Many memtables may be flushed concurrently to an unclosed file. How to guarantee the order of serialization? (2) Once a close task is submitted, a new unclosed file will be created and receives appended data. So there will exists many unclosed files. How the query and compaction process will be impacted? Thanks, Jialin Qiao School of Software, Tsinghua University 乔嘉林 清华大学 软件学院 > -----原始邮件----- > 发件人: "Xiangdong Huang" <[email protected]> > 发送时间: 2019-06-04 23:08:34 (星期二) > 收件人: [email protected], "江天" <[email protected]> > 抄送: > 主题: Re: [jira] [Created] (IOTDB-112) Avoid long tail insertion which is > caused by synchronized close-bufferwrite > > I attached the histogram of the latency in the JIRA. > > The x-axis is the latency while the y-axis is the cumulative distribution. > We can see that about 30% insertion can be finished in 20ms, and 60% > insertion can be finished in 40ms even though the IoTDB instance is serving > for a heavy workload... So, eliminating the long tail insertion can make > the average latency far better. > > If someone is working on the refactor_overflow or refactor_bufferwrite, > please pay attention to the code branch for this issue. > > Best, > > ----------------------------------- > Xiangdong Huang > School of Software, Tsinghua University > > 黄向东 > 清华大学 软件学院 > > > xiangdong Huang (JIRA) <[email protected]> 于2019年6月4日周二 下午11:00写道: > > > xiangdong Huang created IOTDB-112: > > ------------------------------------- > > > > Summary: Avoid long tail insertion which is caused by > > synchronized close-bufferwrite > > Key: IOTDB-112 > > URL: https://issues.apache.org/jira/browse/IOTDB-112 > > Project: Apache IoTDB > > Issue Type: Improvement > > Reporter: xiangdong Huang > > > > > > In our test, IoTDB has a good insertion performance, and the average > > latency can be ~200 ms in a given workload and hardware. > > > > However, when we draw the histogram of the latency, we find that 97.5% > > latencies are less than 200 ms, while 2.7% latencies are greater. The > > result shows that there are some long tail latency. > > > > Then we find that some insertion latencies are about 30 seconds... (but > > the ratio is less than 0.5%). Indeed, for each connection, a long tail > > insertion appears per 1 or 2 minutes.... > > > > By reading source codes, I think it is because that in the insertion > > function, > > > > `private void insertBufferWrite(FileNodeProcessor fileNodeProcessor, long > > timestamp, > > boolean isMonitor, TSRecord tsRecord, String deviceId)`, > > > > if the corresponding TsFile is too large, the function is blocked until > > the memtable is flushed on disk and the TsFile is sealed (we call it as > > closing a TsFile). The latencies of the long tail insertions are very close > > to the time cost of flushing and sealing a TsFile. > > > > So, if we set the closing function using the async mode, we can avoid the > > long tail insertion. > > > > However, there are some side effects we have to fix: > > # At the same time, if a new insertion comes, then a new memtable should > > be assigned, and a new unsealed TsFile is created; > > # That means that there are more than 1 unsealed TsFiles if the system is > > crashed before the closing function is finished. So, we have to modify the > > startup process to recover these files. > > > > Is there any other side effect that I have to pay attention to? > > > > > > > > -- > > This message was sent by Atlassian JIRA > > (v7.6.3#76005) > >
