Re: [IOTDB-726] CheckPoint of MTree

孙泽嵩 Fri, 19 Jun 2020 08:12:17 -0700

Hi Julian and Xiangdong,

> Another thing we could consider is to chunk them according to their 
> namespaces in folders / files or any other struct.
> according to the Storage group names, for example.


Good opinions! I think I’ll consider it as a future sub-task.
Now I’m focusing on the scenario that we have large number of timeseries for 
one storage group.

> By using a faster serialization Lib than Java default we could ideally speed 
> up the process up to 10x.

I tested the lib and got the similar result as mine serialization method:
Serialization: mine 971ms / fast-serialization ~6.7s
Deserialization: mine ~6s / fast-serialization ~7.1s

Because I actually didn’t use Java’s default serialization method, but treated 
the MTree as many MNodes and serialization them as plain texts separately.

> The goal of checkpoint of MTree is accelerating the deserialization when 
> restarting.
> So, just find an idle time of MTree and snapshot it asynchronously is ok.

I agree with Jialin’s idea, and I intend to try checking the state of MTree 
(whether users is changing it actively) and serializing MTree background.
In this way, the client will not be stuck and users may not even notice it.


Best,
-----------------------------------
Zesong Sun
School of Software, Tsinghua University

孙泽嵩
清华大学 软件学院

> 2020年6月19日 18:10，Julian Feinauer <[email protected]> 写道：
> 
> Yes, we could then also use all cores for deserialization (if thats the 
> bottleneck) for reloading all of them.
> Or generally store only some K in one file and then open another one then we 
> could again take care of parallelism.
> 
> J
> 
> Am 19.06.20, 11:35 schrieb "Xiangdong Huang" <[email protected]>:
> 
>> Another thing we could consider is to chunk them according to their
>    namespaces in folders / files or any other struct.
> 
>    according to the Storage group names, for example.
> 
>    -----------------------------------
>    Xiangdong Huang
>    School of Software, Tsinghua University
> 
>     黄向东
>    清华大学 软件学院
> 
> 
>    Julian Feinauer <[email protected]> 于2020年6月19日周五 下午4:54写道：
> 
>> Another thing we could consider is to chunk them according to their
>> namespaces in folders / files or any other struct. Then we could
>> efficiently do lazy loading and only pick what we really need.
>> 
>> WDYT?
>> 
>> Am 19.06.20, 10:36 schrieb "Xiangdong Huang" <[email protected]>:
>> 
>>> I did an experiment for 1M timeseries, and the serialization process
>>    costs 971ms.
>> 
>>    971ms for Serializing 1M timeseries, but 6 seconds for deserializing?
>> 
>>> I didn’t time this … I’ll do an experiment after fixing the suggested
>>    changes in current PR [1]
>> 
>>    The problem of current PR is that your snapshot is larger and larger
>> along
>>    with the system running.
>>    Any idea about this case?
>> 
>>    Best,
>>    -----------------------------------
>>    Xiangdong Huang
>>    School of Software, Tsinghua University
>> 
>>     黄向东
>>    清华大学 软件学院
>> 
>> 
>>    孙泽嵩 <[email protected]> 于2020年6月19日周五 下午2:20写道：
>> 
>>> Wow, thanks, Julian!
>>> 
>>> Let me try and do experiments to get the best result : )
>>> 
>>> Best,
>>> -----------------------------------
>>> Zesong Sun
>>> School of Software, Tsinghua University
>>> 
>>> 孙泽嵩
>>> 清华大学 软件学院
>>> 
>>>> 2020年6月19日 14:14，Julian Feinauer <[email protected]>
>> 写道：
>>>> 
>>>> Oh and another note. By using a faster serialization Lib than Java
>>> default we could ideally speed up the process up to 10x.
>>>> 
>>>> See eg here https://github.com/RuedigerMoeller/fast-serialization
>>>> 
>>>> Julian
>>>> 
>>>> Holen Sie sich Outlook für Android<https://aka.ms/ghei36>
>>>> 
>>>> ________________________________
>>>> From: Julian Feinauer <[email protected]>
>>>> Sent: Friday, June 19, 2020 8:11:56 AM
>>>> To: [email protected] <[email protected]>
>>>> Subject: Re: [IOTDB-726] CheckPoint of MTree
>>>> 
>>>> What about using some kind of cache that spills to disk. That way
>> we
>>> would be up in no time and just lazy load devices when needed.
>>>> 
>>>> I remember that eh cache has such features (
>>> https://www.baeldung.com/ehcache) but there are other
>> implementations as
>>> well.
>>>> 
>>>> Julian
>>>> 
>>>> Holen Sie sich Outlook für Android<https://aka.ms/ghei36>
>>>> 
>>>> ________________________________
>>>> From: 孙泽嵩 <[email protected]>
>>>> Sent: Friday, June 19, 2020 7:57:51 AM
>>>> To: [email protected] <[email protected]>
>>>> Subject: Re: [IOTDB-726] CheckPoint of MTree
>>>> 
>>>> Hi Jialin,
>>>> 
>>>> I did an experiment for 1M timeseries, and the serialization
>> process
>>> costs 971ms.
>>>> 
>>>> Maybe we could consider creating a snapshot when the MTree is not
>>> changed for a long time (for example, one hour).
>>>> 
>>>> In this way, the client will not be stuck and users may not even
>> notice
>>> it.
>>>> 
>>>> 
>>>> Best,
>>>> -----------------------------------
>>>> Zesong Sun
>>>> School of Software, Tsinghua University
>>>> 
>>>> 孙泽嵩
>>>> 清华大学 软件学院
>>>> 
>>>>> 2020年6月18日 16:19，孙泽嵩 <[email protected]> 写道：
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Good opinions!
>>>>> 
>>>>>> how about adding a "create snapshot for schema" sql to let users
>>> trigger this manually
>>>>> 
>>>>> I’ll add this sql in a new PR.
>>>>> 
>>>>>> how long it takes to recover from a 1M timeseries snapshot.
>>>>> 
>>>>> Based on my previous experiment, it takes about 6s as you said.
>>>>> 
>>>>>> how long it takes to create a snapshot for 1M/10M timeseries?
>>>>> 
>>>>> I didn’t time this … I’ll do an experiment after fixing the
>> suggested
>>> changes in current PR [1]
>>>>> 
>>>>> 
>>>>> [1] https://github.com/apache/incubator-iotdb/pull/1384
>>>>> 
>>>>> 
>>>>> Best,
>>>>> -----------------------------------
>>>>> Zesong Sun
>>>>> School of Software, Tsinghua University
>>>>> 
>>>>> 孙泽嵩
>>>>> 清华大学 软件学院
>>>>> 
>>>>>> 2020年6月18日 14:39，Jialin Qiao <[email protected]> 写道：
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Currently, the snapshot is triggered every xxx lines in mlog.txt.
>>>>>> When meeting 20M timeseries, the default 10k lines will cause
>> too many
>>> snapshot, which will block the creating.
>>>>>> However, if we enlarge the condition to 1M, the last 1M will take
>>> about 6s to recover, about 160K per second.
>>>>>> 
>>>>>> So, my concern is how long it takes to create a snapshot for
>> 1M/10M
>>> timeseries? And how long it takes to recover from a 1M timeseries
>> snapshot.
>>>>>> 
>>>>>> Besides, how about adding a "create snapshot for schema" sql to
>> let
>>> users trigger this manually?
>>>>>> 
>>>>>> Thanks,
>>>>>> --
>>>>>> Jialin Qiao
>>>>>> School of Software, Tsinghua University
>>>>>> 
>>>>>> 乔嘉林
>>>>>> 清华大学 软件学院
>>>>>> 
>>>>>>> -----原始邮件-----
>>>>>>> 发件人: "孙泽嵩" <[email protected]>
>>>>>>> 发送时间: 2020-06-15 19:14:08 (星期一)
>>>>>>> 收件人: [email protected]
>>>>>>> 抄送:
>>>>>>> 主题: Re: [IOTDB-726] CheckPoint of MTree
>>>>>>> 
>>>>>>> Hi Julian,
>>>>>>> 
>>>>>>> Currently I’m just using plain text file.
>>>>>>> 
>>>>>>> But I could consider and try with RocksDB : )
>>>>>>> I also noticed that there is an issue related to RocksDB
>> integration
>>> [1].
>>>>>>> 
>>>>>>> 
>>>>>>> [1] https://issues.apache.org/jira/browse/IOTDB-767
>>>>>>> 
>>>>>>> 
>>>>>>> Best,
>>>>>>> -----------------------------------
>>>>>>> Zesong Sun
>>>>>>> School of Software, Tsinghua University
>>>>>>> 
>>>>>>> 孙泽嵩
>>>>>>> 清华大学 软件学院
>>>>>>> 
>>>>>>>> 2020年6月15日 19:00，Julian Feinauer <[email protected]>
>> 写道：
>>>>>>>> 
>>>>>>>> Hi Zesong,
>>>>>>>> 
>>>>>>>> this is an excellent Idea!
>>>>>>>> Do you serialize the snapshot as plain text file?
>>>>>>>> Or would it make sense to use something like RocksDB for
>> something
>>> like that?
>>>>>>>> 
>>>>>>>> Julian
>>>>>>>> 
>>>>>>>> Am 15.06.20, 12:12 schrieb "孙泽嵩" <[email protected]
>>> :
>>>>>>>> 
>>>>>>>> Greetings,
>>>>>>>> 
>>>>>>>> I’m currently working on issue [IOTDB-726] CheckPoint of MTree
>> [1]
>>>>>>>> 
>>>>>>>> In the situation that there exist a large number of
>> timeseries, it
>>> would take a long time to restart IoTDB by reading mlog.txt and
>> executing
>>> the commands line by line.
>>>>>>>> For example, it takes about 2 minutes to restart with 20M
>> timeseries.
>>>>>>>> 
>>>>>>>> To solve this problem, “checkpoint” is designed and added to
>> MTree
>>> to reduce the time of reading mlog when IoTDB restarts:
>>>>>>>> Generate a snapshot, which includes the serialization of MTree,
>>> every time mlog reaches a certain number of lines.
>>>>>>>> When a new snapshot is generated, the old one is deleted.
>> Snapshot
>>> file and mlog.txt are in the same directory.
>>>>>>>> 
>>>>>>>> Users could configure the threshold number of the mlog lines.
>> By
>>> default, a snapshot is generated for every 100k lines.
>>>>>>>> 
>>>>>>>> I’ve already made a demo and proved that the method could
>> speed up
>>> the restarting process.
>>>>>>>> As for the reading mlog.txt and initializing MTree part, it
>> reduces
>>> time by 28.3% (16.6s with origin method, 11.9s with new demo, both
>> for 2M
>>> timeseries).
>>>>>>>> 
>>>>>>>> I would like to make a PR afterwards. If you have any
>> suggestions
>>> about the design, feel free to discuss with me.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> [1] https://issues.apache.org/jira/browse/IOTDB-726
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> -----------------------------------
>>>>>>>> Zesong Sun
>>>>>>>> School of Software, Tsinghua University
>>>>>>>> 
>>>>>>>> 孙泽嵩
>>>>>>>> 清华大学 软件学院
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>

Re: [IOTDB-726] CheckPoint of MTree

Reply via email to