Wow, thanks, Julian! Let me try and do experiments to get the best result : )
Best, ----------------------------------- Zesong Sun School of Software, Tsinghua University 孙泽嵩 清华大学 软件学院 > 2020年6月19日 14:14,Julian Feinauer <[email protected]> 写道: > > Oh and another note. By using a faster serialization Lib than Java default we > could ideally speed up the process up to 10x. > > See eg here https://github.com/RuedigerMoeller/fast-serialization > > Julian > > Holen Sie sich Outlook für Android<https://aka.ms/ghei36> > > ________________________________ > From: Julian Feinauer <[email protected]> > Sent: Friday, June 19, 2020 8:11:56 AM > To: [email protected] <[email protected]> > Subject: Re: [IOTDB-726] CheckPoint of MTree > > What about using some kind of cache that spills to disk. That way we would be > up in no time and just lazy load devices when needed. > > I remember that eh cache has such features (https://www.baeldung.com/ehcache) > but there are other implementations as well. > > Julian > > Holen Sie sich Outlook für Android<https://aka.ms/ghei36> > > ________________________________ > From: 孙泽嵩 <[email protected]> > Sent: Friday, June 19, 2020 7:57:51 AM > To: [email protected] <[email protected]> > Subject: Re: [IOTDB-726] CheckPoint of MTree > > Hi Jialin, > > I did an experiment for 1M timeseries, and the serialization process costs > 971ms. > > Maybe we could consider creating a snapshot when the MTree is not changed for > a long time (for example, one hour). > > In this way, the client will not be stuck and users may not even notice it. > > > Best, > ----------------------------------- > Zesong Sun > School of Software, Tsinghua University > > 孙泽嵩 > 清华大学 软件学院 > >> 2020年6月18日 16:19,孙泽嵩 <[email protected]> 写道: >> >> Hi, >> >> Good opinions! >> >>> how about adding a "create snapshot for schema" sql to let users trigger >>> this manually >> >> I’ll add this sql in a new PR. >> >>> how long it takes to recover from a 1M timeseries snapshot. >> >> Based on my previous experiment, it takes about 6s as you said. >> >>> how long it takes to create a snapshot for 1M/10M timeseries? >> >> I didn’t time this … I’ll do an experiment after fixing the suggested >> changes in current PR [1] >> >> >> [1] https://github.com/apache/incubator-iotdb/pull/1384 >> >> >> Best, >> ----------------------------------- >> Zesong Sun >> School of Software, Tsinghua University >> >> 孙泽嵩 >> 清华大学 软件学院 >> >>> 2020年6月18日 14:39,Jialin Qiao <[email protected]> 写道: >>> >>> Hi, >>> >>> Currently, the snapshot is triggered every xxx lines in mlog.txt. >>> When meeting 20M timeseries, the default 10k lines will cause too many >>> snapshot, which will block the creating. >>> However, if we enlarge the condition to 1M, the last 1M will take about 6s >>> to recover, about 160K per second. >>> >>> So, my concern is how long it takes to create a snapshot for 1M/10M >>> timeseries? And how long it takes to recover from a 1M timeseries snapshot. >>> >>> Besides, how about adding a "create snapshot for schema" sql to let users >>> trigger this manually? >>> >>> Thanks, >>> -- >>> Jialin Qiao >>> School of Software, Tsinghua University >>> >>> 乔嘉林 >>> 清华大学 软件学院 >>> >>>> -----原始邮件----- >>>> 发件人: "孙泽嵩" <[email protected]> >>>> 发送时间: 2020-06-15 19:14:08 (星期一) >>>> 收件人: [email protected] >>>> 抄送: >>>> 主题: Re: [IOTDB-726] CheckPoint of MTree >>>> >>>> Hi Julian, >>>> >>>> Currently I’m just using plain text file. >>>> >>>> But I could consider and try with RocksDB : ) >>>> I also noticed that there is an issue related to RocksDB integration [1]. >>>> >>>> >>>> [1] https://issues.apache.org/jira/browse/IOTDB-767 >>>> >>>> >>>> Best, >>>> ----------------------------------- >>>> Zesong Sun >>>> School of Software, Tsinghua University >>>> >>>> 孙泽嵩 >>>> 清华大学 软件学院 >>>> >>>>> 2020年6月15日 19:00,Julian Feinauer <[email protected]> 写道: >>>>> >>>>> Hi Zesong, >>>>> >>>>> this is an excellent Idea! >>>>> Do you serialize the snapshot as plain text file? >>>>> Or would it make sense to use something like RocksDB for something like >>>>> that? >>>>> >>>>> Julian >>>>> >>>>> Am 15.06.20, 12:12 schrieb "孙泽嵩" <[email protected]>: >>>>> >>>>> Greetings, >>>>> >>>>> I’m currently working on issue [IOTDB-726] CheckPoint of MTree [1] >>>>> >>>>> In the situation that there exist a large number of timeseries, it would >>>>> take a long time to restart IoTDB by reading mlog.txt and executing the >>>>> commands line by line. >>>>> For example, it takes about 2 minutes to restart with 20M timeseries. >>>>> >>>>> To solve this problem, “checkpoint” is designed and added to MTree to >>>>> reduce the time of reading mlog when IoTDB restarts: >>>>> Generate a snapshot, which includes the serialization of MTree, every >>>>> time mlog reaches a certain number of lines. >>>>> When a new snapshot is generated, the old one is deleted. Snapshot file >>>>> and mlog.txt are in the same directory. >>>>> >>>>> Users could configure the threshold number of the mlog lines. By default, >>>>> a snapshot is generated for every 100k lines. >>>>> >>>>> I’ve already made a demo and proved that the method could speed up the >>>>> restarting process. >>>>> As for the reading mlog.txt and initializing MTree part, it reduces time >>>>> by 28.3% (16.6s with origin method, 11.9s with new demo, both for 2M >>>>> timeseries). >>>>> >>>>> I would like to make a PR afterwards. If you have any suggestions about >>>>> the design, feel free to discuss with me. >>>>> >>>>> >>>>> [1] https://issues.apache.org/jira/browse/IOTDB-726 >>>>> >>>>> >>>>> Best, >>>>> ----------------------------------- >>>>> Zesong Sun >>>>> School of Software, Tsinghua University >>>>> >>>>> 孙泽嵩 >>>>> 清华大学 软件学院 >>>>> >>>>> >>>> >> >
