> I did an experiment for 1M timeseries, and the serialization process costs 971ms.
971ms for Serializing 1M timeseries, but 6 seconds for deserializing? > I didn’t time this … I’ll do an experiment after fixing the suggested changes in current PR [1] The problem of current PR is that your snapshot is larger and larger along with the system running. Any idea about this case? Best, ----------------------------------- Xiangdong Huang School of Software, Tsinghua University 黄向东 清华大学 软件学院 孙泽嵩 <[email protected]> 于2020年6月19日周五 下午2:20写道: > Wow, thanks, Julian! > > Let me try and do experiments to get the best result : ) > > Best, > ----------------------------------- > Zesong Sun > School of Software, Tsinghua University > > 孙泽嵩 > 清华大学 软件学院 > > > 2020年6月19日 14:14,Julian Feinauer <[email protected]> 写道: > > > > Oh and another note. By using a faster serialization Lib than Java > default we could ideally speed up the process up to 10x. > > > > See eg here https://github.com/RuedigerMoeller/fast-serialization > > > > Julian > > > > Holen Sie sich Outlook für Android<https://aka.ms/ghei36> > > > > ________________________________ > > From: Julian Feinauer <[email protected]> > > Sent: Friday, June 19, 2020 8:11:56 AM > > To: [email protected] <[email protected]> > > Subject: Re: [IOTDB-726] CheckPoint of MTree > > > > What about using some kind of cache that spills to disk. That way we > would be up in no time and just lazy load devices when needed. > > > > I remember that eh cache has such features ( > https://www.baeldung.com/ehcache) but there are other implementations as > well. > > > > Julian > > > > Holen Sie sich Outlook für Android<https://aka.ms/ghei36> > > > > ________________________________ > > From: 孙泽嵩 <[email protected]> > > Sent: Friday, June 19, 2020 7:57:51 AM > > To: [email protected] <[email protected]> > > Subject: Re: [IOTDB-726] CheckPoint of MTree > > > > Hi Jialin, > > > > I did an experiment for 1M timeseries, and the serialization process > costs 971ms. > > > > Maybe we could consider creating a snapshot when the MTree is not > changed for a long time (for example, one hour). > > > > In this way, the client will not be stuck and users may not even notice > it. > > > > > > Best, > > ----------------------------------- > > Zesong Sun > > School of Software, Tsinghua University > > > > 孙泽嵩 > > 清华大学 软件学院 > > > >> 2020年6月18日 16:19,孙泽嵩 <[email protected]> 写道: > >> > >> Hi, > >> > >> Good opinions! > >> > >>> how about adding a "create snapshot for schema" sql to let users > trigger this manually > >> > >> I’ll add this sql in a new PR. > >> > >>> how long it takes to recover from a 1M timeseries snapshot. > >> > >> Based on my previous experiment, it takes about 6s as you said. > >> > >>> how long it takes to create a snapshot for 1M/10M timeseries? > >> > >> I didn’t time this … I’ll do an experiment after fixing the suggested > changes in current PR [1] > >> > >> > >> [1] https://github.com/apache/incubator-iotdb/pull/1384 > >> > >> > >> Best, > >> ----------------------------------- > >> Zesong Sun > >> School of Software, Tsinghua University > >> > >> 孙泽嵩 > >> 清华大学 软件学院 > >> > >>> 2020年6月18日 14:39,Jialin Qiao <[email protected]> 写道: > >>> > >>> Hi, > >>> > >>> Currently, the snapshot is triggered every xxx lines in mlog.txt. > >>> When meeting 20M timeseries, the default 10k lines will cause too many > snapshot, which will block the creating. > >>> However, if we enlarge the condition to 1M, the last 1M will take > about 6s to recover, about 160K per second. > >>> > >>> So, my concern is how long it takes to create a snapshot for 1M/10M > timeseries? And how long it takes to recover from a 1M timeseries snapshot. > >>> > >>> Besides, how about adding a "create snapshot for schema" sql to let > users trigger this manually? > >>> > >>> Thanks, > >>> -- > >>> Jialin Qiao > >>> School of Software, Tsinghua University > >>> > >>> 乔嘉林 > >>> 清华大学 软件学院 > >>> > >>>> -----原始邮件----- > >>>> 发件人: "孙泽嵩" <[email protected]> > >>>> 发送时间: 2020-06-15 19:14:08 (星期一) > >>>> 收件人: [email protected] > >>>> 抄送: > >>>> 主题: Re: [IOTDB-726] CheckPoint of MTree > >>>> > >>>> Hi Julian, > >>>> > >>>> Currently I’m just using plain text file. > >>>> > >>>> But I could consider and try with RocksDB : ) > >>>> I also noticed that there is an issue related to RocksDB integration > [1]. > >>>> > >>>> > >>>> [1] https://issues.apache.org/jira/browse/IOTDB-767 > >>>> > >>>> > >>>> Best, > >>>> ----------------------------------- > >>>> Zesong Sun > >>>> School of Software, Tsinghua University > >>>> > >>>> 孙泽嵩 > >>>> 清华大学 软件学院 > >>>> > >>>>> 2020年6月15日 19:00,Julian Feinauer <[email protected]> 写道: > >>>>> > >>>>> Hi Zesong, > >>>>> > >>>>> this is an excellent Idea! > >>>>> Do you serialize the snapshot as plain text file? > >>>>> Or would it make sense to use something like RocksDB for something > like that? > >>>>> > >>>>> Julian > >>>>> > >>>>> Am 15.06.20, 12:12 schrieb "孙泽嵩" <[email protected]>: > >>>>> > >>>>> Greetings, > >>>>> > >>>>> I’m currently working on issue [IOTDB-726] CheckPoint of MTree [1] > >>>>> > >>>>> In the situation that there exist a large number of timeseries, it > would take a long time to restart IoTDB by reading mlog.txt and executing > the commands line by line. > >>>>> For example, it takes about 2 minutes to restart with 20M timeseries. > >>>>> > >>>>> To solve this problem, “checkpoint” is designed and added to MTree > to reduce the time of reading mlog when IoTDB restarts: > >>>>> Generate a snapshot, which includes the serialization of MTree, > every time mlog reaches a certain number of lines. > >>>>> When a new snapshot is generated, the old one is deleted. Snapshot > file and mlog.txt are in the same directory. > >>>>> > >>>>> Users could configure the threshold number of the mlog lines. By > default, a snapshot is generated for every 100k lines. > >>>>> > >>>>> I’ve already made a demo and proved that the method could speed up > the restarting process. > >>>>> As for the reading mlog.txt and initializing MTree part, it reduces > time by 28.3% (16.6s with origin method, 11.9s with new demo, both for 2M > timeseries). > >>>>> > >>>>> I would like to make a PR afterwards. If you have any suggestions > about the design, feel free to discuss with me. > >>>>> > >>>>> > >>>>> [1] https://issues.apache.org/jira/browse/IOTDB-726 > >>>>> > >>>>> > >>>>> Best, > >>>>> ----------------------------------- > >>>>> Zesong Sun > >>>>> School of Software, Tsinghua University > >>>>> > >>>>> 孙泽嵩 > >>>>> 清华大学 软件学院 > >>>>> > >>>>> > >>>> > >> > > > >
