Re: Avoid long-tail insertion

Jialin Qiao Thu, 27 Jun 2019 23:06:09 -0700

Hi,

1. TVList is an evolution version of PrimitiveArrayList, as it has the ability 
to sort data. The data structure is still array of primitive types. By 
including sorting, it avoids using ArrayList and TimeValuePair, which cause a 
lot of temp objects (causes much overhead for GC in JVM). We add a test 
LongTVListTest to evaluate the performances of TVList and PrimitiveArrayList. 
The result shows that the total time cost of write+sort+read of TVList is about 
half of PrimitiveArrayList.


2. Version 2 is the new storage engine. We leave version 1 just for reference. 
Once the version 2 is stable, we will remove version 1.

3. Thanks, the change of PR109 is added in this branch.

Best,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -----原始邮件-----
> 发件人: "Xiangdong Huang" <[email protected]>
> 发送时间: 2019-06-27 22:28:20 (星期四)
> 收件人: [email protected]
> 抄送: 
> 主题: Re: Re: Re: Avoid long-tail insertion
> 
> Hi,
> 
> I am reading the codes on the branch feature_async_close_tsfile... and
> there are some questions about your current work:
> 
> 1. Why you replace the PrimitiveArrayList with TVList?
>  As I know, PrimitiveArrayList is for avoiding auto-boxing, and it uses
> some tricks  to avoid writing too many similar codes.
> So, does TVList have better performance?
> 
> 2. Some classes have both version 1 and version 2, e.g., FileNodeManager
> and FileNodeManagerV2, will you retain both of them or just leave version 2?
> 
> 3. The behavior of the NativeRestorableIOWriter is changed in this
> branch... please notice the related PR (PR109)
> 
> I have to say, this branch contains bulk of modifications....
> 
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
> 
>  黄向东
> 清华大学 软件学院
> 
> 
> Jialin Qiao <[email protected]> 于2019年6月24日周一 下午8:24写道：
> 
> >
> > Yes, there are many changes. The branch I am working on is
> > feature_async_close_tsfile.
> > Anyone interested is welcome to join and discuss.
> >
> > Best,
> > --
> > Jialin Qiao
> > School of Software, Tsinghua University
> >
> > 乔嘉林
> > 清华大学 软件学院
> >
> > > -----原始邮件-----
> > > 发件人: "Xiangdong Huang" <[email protected]>
> > > 发送时间: 2019-06-23 10:59:29 (星期日)
> > > 收件人: [email protected]
> > > 抄送:
> > > 主题: Re: Re: Avoid long-tail insertion
> > >
> > > Hi,
> > >
> > > Once your work branch is almost ready, let me know so I can help to
> > review.
> > > I think it is a HUGE PR...
> > >
> > > -----------------------------------
> > > Xiangdong Huang
> > > School of Software, Tsinghua University
> > >
> > >  黄向东
> > > 清华大学 软件学院
> > >
> > >
> > > Jialin Qiao <[email protected]> 于2019年6月22日周六 下午9:57写道：
> > >
> > > > Hi Xiangdong,
> > > >
> > > > I will merge this patch. Let "Directories" manage the folders of both
> > > > sequence and unSequence files is good.
> > > >
> > > > However, the naming of "Directories" is not clear. It would be better
> > to
> > > > rename to "DirectoryManager"
> > > >
> > > > Best,
> > > > --
> > > > Jialin Qiao
> > > > School of Software, Tsinghua University
> > > >
> > > > 乔嘉林
> > > > 清华大学 软件学院
> > > >
> > > > > -----原始邮件-----
> > > > > 发件人: "Xiangdong Huang" <[email protected]>
> > > > > 发送时间: 2019-06-22 16:35:29 (星期六)
> > > > > 收件人: [email protected]
> > > > > 抄送:
> > > > > 主题: Re: Avoid long-tail insertion
> > > > >
> > > > > Hi jialin,
> > > > >
> > > > > I submit some modifications for:
> > > > >
> > > > > * add the overflow data folder location setting in the
> > > > > iotdb-engine.properties;
> > > > > * let Directories.java to manage the above folder.
> > > > >
> > > > > If you need to refactor the overflow when you solving the long tail
> > > > issue,
> > > > > you can apply the patch from [1] first to simplify your work.
> > > > >
> > > > > [1]
> > > > >
> > > >
> > https://issues.apache.org/jira/secure/attachment/12972547/overflow-folder.patch
> > > > >
> > > > > Best,
> > > > > -----------------------------------
> > > > > Xiangdong Huang
> > > > > School of Software, Tsinghua University
> > > > >
> > > > >  黄向东
> > > > > 清华大学 软件学院
> > > > >
> > > > >
> > > > > Xiangdong Huang <[email protected]> 于2019年6月22日周六 下午3:19写道：
> > > > >
> > > > > > If you change the process like this, i.e., there are more than one
> > > > > > unsealed TsFiles for each storage group, then  you have to modify
> > the
> > > > WAL
> > > > > > module.. Because current WAL module only recognizes the last
> > unsealed
> > > > > > TsFile..
> > > > > >
> > > > > > By the way, "sealed" is better than "closed", I think..  A sealed
> > file
> > > > > > means the file which has the magic string at the head and the tail.
> > > > > >
> > > > > > Best,
> > > > > > -----------------------------------
> > > > > > Xiangdong Huang
> > > > > > School of Software, Tsinghua University
> > > > > >
> > > > > >  黄向东
> > > > > > 清华大学 软件学院
> > > > > >
> > > > > >
> > > > > > Jialin Qiao <[email protected]> 于2019年6月22日周六 下午2:54写道：
> > > > > >
> > > > > >>
> > > > > >> Hi, I am solving the long-tail latency problem.
> > > > > >>
> > > > > >> There are some cases (blocking points) that blocking the
> > insertion.
> > > > For a
> > > > > >> better understanding of this problem, I first introduce the
> > writing
> > > > process
> > > > > >> of IoTDB:
> > > > > >>
> > > > > >> IoTDB maintains several independent engines (storage group) that
> > > > supports
> > > > > >> read and write. In the following, we focus on one engine. A engine
> > > > > >> maintains several closed data files and one unclosed data file
> > that
> > > > > >> receives appended data. In memory, there is only one working
> > memtable
> > > > (m1)
> > > > > >> that receives writes. There is also another memtable (m2) that
> > will
> > > > take
> > > > > >> place m1 when m1 is full and being flushed.
> > > > > >>
> > > > > >> When a data item is inserted:
> > > > > >>
> > > > > >> (1)We insert it into the working memtable.
> > > > > >> (2)We check the size of the memtable. If it reaches a threshold,
> > we
> > > > > >> submit a flush task “after the previous flush task is finished”
> > and
> > > > switch
> > > > > >> the two memtables.
> > > > > >> (3)We check the size of the unclosed file. If it reaches a
> > threshold,
> > > > we
> > > > > >> close it “after the previous flush task is finished”.
> > > > > >>
> > > > > >> In the above steps, all the "after the previous flush task is
> > > > finished"
> > > > > >> will block the insertion process. One solution is to make all
> > flush
> > > > and
> > > > > >> close task asynchronous. Some questions need to carefully
> > considered:
> > > > > >>
> > > > > >> (1) Many memtables may be flushed concurrently to an unclosed
> > file.
> > > > How
> > > > > >> to guarantee the order of serialization?
> > > > > >> (2) Once a close task is submitted, a new unclosed file will be
> > > > created
> > > > > >> and receives appended data. So there will exists many unclosed
> > files.
> > > > How
> > > > > >> the query and compaction process will be impacted?
> > > > > >>
> > > > > >> Thanks,
> > > > > >>
> > > > > >> Jialin Qiao
> > > > > >> School of Software, Tsinghua University
> > > > > >>
> > > > > >> 乔嘉林
> > > > > >> 清华大学 软件学院
> > > > > >>
> > > > > >> > -----原始邮件-----
> > > > > >> > 发件人: "Xiangdong Huang" <[email protected]>
> > > > > >> > 发送时间: 2019-06-04 23:08:34 (星期二)
> > > > > >> > 收件人: [email protected], "江天" <[email protected]>
> > > > > >> > 抄送:
> > > > > >> > 主题: Re: [jira] [Created] (IOTDB-112) Avoid long tail insertion
> > > > which is
> > > > > >> caused by synchronized close-bufferwrite
> > > > > >> >
> > > > > >> > I attached the histogram of the latency in the JIRA.
> > > > > >> >
> > > > > >> > The x-axis is the latency while the y-axis is the cumulative
> > > > > >> distribution.
> > > > > >> > We can see that about 30% insertion can be finished in 20ms,
> > and 60%
> > > > > >> > insertion can be finished in 40ms even though the IoTDB
> > instance is
> > > > > >> serving
> > > > > >> > for a heavy workload... So, eliminating the long tail insertion
> > can
> > > > make
> > > > > >> > the average latency far better.
> > > > > >> >
> > > > > >> > If someone is working on the refactor_overflow or
> > > > refactor_bufferwrite,
> > > > > >> > please pay attention to the code branch for this issue.
> > > > > >> >
> > > > > >> > Best,
> > > > > >> >
> > > > > >> > -----------------------------------
> > > > > >> > Xiangdong Huang
> > > > > >> > School of Software, Tsinghua University
> > > > > >> >
> > > > > >> >  黄向东
> > > > > >> > 清华大学 软件学院
> > > > > >> >
> > > > > >> >
> > > > > >> > xiangdong Huang (JIRA) <[email protected]> 于2019年6月4日周二
> > 下午11:00写道：
> > > > > >> >
> > > > > >> > > xiangdong Huang created IOTDB-112:
> > > > > >> > > -------------------------------------
> > > > > >> > >
> > > > > >> > >              Summary: Avoid long tail insertion which is
> > caused by
> > > > > >> > > synchronized close-bufferwrite
> > > > > >> > >                  Key: IOTDB-112
> > > > > >> > >                  URL:
> > > > https://issues.apache.org/jira/browse/IOTDB-112
> > > > > >> > >              Project: Apache IoTDB
> > > > > >> > >           Issue Type: Improvement
> > > > > >> > >             Reporter: xiangdong Huang
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > In our test, IoTDB has a good insertion performance, and the
> > > > average
> > > > > >> > > latency can be ~200 ms in a given workload and hardware.
> > > > > >> > >
> > > > > >> > > However, when we draw the histogram of the latency, we find
> > that
> > > > 97.5%
> > > > > >> > > latencies are less than 200 ms, while 2.7% latencies are
> > greater.
> > > > The
> > > > > >> > > result shows that there are some long tail latency.
> > > > > >> > >
> > > > > >> > > Then we find that some insertion latencies are about 30
> > seconds...
> > > > > >> (but
> > > > > >> > > the ratio is less than 0.5%). Indeed, for each connection, a
> > long
> > > > tail
> > > > > >> > > insertion appears per 1 or 2 minutes....
> > > > > >> > >
> > > > > >> > > By reading source codes, I think it is because that in the
> > > > insertion
> > > > > >> > > function,
> > > > > >> > >
> > > > > >> > > `private void insertBufferWrite(FileNodeProcessor
> > > > fileNodeProcessor,
> > > > > >> long
> > > > > >> > > timestamp,
> > > > > >> > >  boolean isMonitor, TSRecord tsRecord, String deviceId)`,
> > > > > >> > >
> > > > > >> > > if the corresponding TsFile is too large, the function is
> > blocked
> > > > > >> until
> > > > > >> > > the memtable is flushed on disk and the TsFile is sealed (we
> > call
> > > > it
> > > > > >> as
> > > > > >> > > closing a TsFile). The latencies of the long tail insertions
> > are
> > > > very
> > > > > >> close
> > > > > >> > > to the time cost of flushing and sealing a TsFile.
> > > > > >> > >
> > > > > >> > > So, if we set the closing function using the async mode, we
> > can
> > > > avoid
> > > > > >> the
> > > > > >> > > long tail insertion.
> > > > > >> > >
> > > > > >> > > However,  there are some side effects we have to fix:
> > > > > >> > >  # At the same time, if a new insertion comes, then a new
> > memtable
> > > > > >> should
> > > > > >> > > be assigned, and a new unsealed TsFile is created;
> > > > > >> > >  # That means that there are more than 1 unsealed TsFiles if
> > the
> > > > > >> system is
> > > > > >> > > crashed before the closing function is finished. So, we have
> > to
> > > > > >> modify the
> > > > > >> > > startup process to recover these files.
> > > > > >> > >
> > > > > >> > > Is there any other side effect that I have to pay attention
> > to?
> > > > > >> > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > --
> > > > > >> > > This message was sent by Atlassian JIRA
> > > > > >> > > (v7.6.3#76005)
> > > > > >> > >
> > > > > >>
> > > > > >
> > > >
> >

Re: Avoid long-tail insertion

Reply via email to