+1 It’s a good idea. Thanks,
Haonan Hou > On Feb 13, 2020, at 5:20 PM, Dawei Liu <atoi...@163.com> wrote: > > Hi, > > Sorry,i overlooked that the first step in server was to filter files through > startTime/endTime Map. > > +1 for add a Statistics in TimeseriesMetadata, > > For example: > Device Shadow (设备影子) , it is often necessary to find the last information > about a device > > Thanks > > Dawei Liu > >> 2020年2月13日 下午4:58,Jialin Qiao <qj...@mails.tsinghua.edu.cn> 写道: >> >> Hi, >> >> I have a suggestion. >> We could add a Statistics in TimeseriesMetadata to support fast aggregations. >> >> Thanks, >> -- >> Jialin Qiao >> School of Software, Tsinghua University >> >> 乔嘉林 >> 清华大学 软件学院 >> >>> -----原始邮件----- >>> 发件人: "Jialin Qiao" <qj...@mails.tsinghua.edu.cn> >>> 发送时间: 2020-02-13 16:16:54 (星期四) >>> 收件人: dev@iotdb.apache.org >>> 抄送: >>> 主题: Re: Suggestions for new TsFile >>> >>> Hi, >>> >>> +1 for most queries contains a time filter >>> >>> But I don't know what do you mean by "add a time attribute", add to where? >>> >>> Thanks, >>> -- >>> Jialin Qiao >>> School of Software, Tsinghua University >>> >>> 乔嘉林 >>> 清华大学 软件学院 >>> >>>> -----原始邮件----- >>>> 发件人: "Dawei Liu" <atoi...@163.com> >>>> 发送时间: 2020-02-13 15:55:48 (星期四) >>>> 收件人: dev@iotdb.apache.org >>>> 抄送: >>>> 主题: Re: Suggestions for new TsFile >>>> >>>> Hi, >>>> >>>> I found another problem, when I execute : ` SELECT s1 FROM xx WHERE time >>>> = 1` >>>> >>>> In the new TsFile, need to read the hard drive 3 times, >>>> >>>> 1. Read TsFileMetaData >>>> >>>> 2. Read the MetaData of all measurement of the device ( TimeSeriesMetaData >>>> ) >>>> >>>> 3. Read the required measurement of the ChunkMetaData and then the time >>>> filter( time = 1 ) can be filter which Chunk can be used >>>> >>>> >>>> >>>> In the current server, most of the time is used for TimeFilter, we read a >>>> lot of metadata information, if the end can not be used, this is a very >>>> big loss. >>>> >>>> So I think we should add a time attribute so that we can know if the file >>>> is can’t to use when we first read the hard drive >>>> >>>> Thanks >>>> >>>> Dawei Liu >>>> >>>> >>>>> 2020年2月12日 下午8:03,Dawei Liu <atoi...@163.com> 写道: >>>>> >>>>> Hi, >>>>> >>>>> I see it. It looks very comfortable. >>>>> >>>>> Thanks >>>>> >>>>> Dawei Liu >>>>> >>>>>> 2020年2月12日 下午6:52,Haonan Hou <hhao...@outlook.com> 写道: >>>>>> >>>>>> Sure, already added. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Haonan Hou >>>>>> >>>>>>> On Feb 12, 2020, at 5:57 PM, Jialin Qiao <qj...@mails.tsinghua.edu.cn> >>>>>>> wrote: >>>>>>> >>>>>>> Hi Haonan, >>>>>>> >>>>>>> >>>>>>> I can not see the picture, could you please put it in the PR? >>>>>>> >>>>>>> Thanks, >>>>>>> -- >>>>>>> Jialin Qiao >>>>>>> School of Software, Tsinghua University >>>>>>> >>>>>>> 乔嘉林 >>>>>>> 清华大学 软件学院 >>>>>>> >>>>>>> -----原始邮件----- >>>>>>> 发件人:"Haonan Hou" <hhao...@outlook.com> >>>>>>> 发送时间:2020-02-12 16:27:22 (星期三) >>>>>>> 收件人: "dev@iotdb.apache.org" <dev@iotdb.apache.org> >>>>>>> 抄送: >>>>>>> 主题: Re: Suggestions for new TsFile >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> >>>>>>> We have a newer design of TsFile, which combines the suggestions from >>>>>>> Jialin and Dawei. >>>>>>> >>>>>>> >>>>>>> The mean differences is as below: >>>>>>> >>>>>>> >>>>>>> 1. Remove TsOffsetArray. >>>>>>> 2. Modify the device map in TsFileMetaData to store the start offset of >>>>>>> first TimeseriesMetadata and total data size of all TimeseriesMetadatas >>>>>>> in each device. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> The newer TsFile structure should be looked like: >>>>>>> >>>>>>> Here is an example of how the new structure works. >>>>>>> >>>>>>> >>>>>>> When we try to get List<ChunkMetadata> of Timeseries "d0.s1", first we >>>>>>> deserialize the map in TsFileMetadata, and we have the startOffset of >>>>>>> TimseriesMetadata “s0", >>>>>>> the first TimeseiresMetadata of “d0", and data size of all >>>>>>> TimeseriesMetadatas in “d0". >>>>>>> >>>>>>> >>>>>>> After that, we are able to deserialize all TimeseriesMetadata in “d0”. >>>>>>> >>>>>>> >>>>>>> Finally we have the TimeseriesMetadata "d0.s1" and can get the >>>>>>> ChunkMetadata List of "d0.s1". >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> >>>>>>> Haonan Hou >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Feb 11, 2020, at 8:08 PM, Jialin Qiao <qiaojia...@apache.org> wrote: >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> If each device only stores each offset of TimeseriesMetadata like this: >>>>>>> TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] >>>>>>> }, … >>>>>>> } >>>>>>> >>>>>>> It could be simplified to recording the start offset and end offset: >>>>>>> TsFileMetaData ---> [ {deviceId(d0), [0, 2] }, {deviceId(d1), [3,5] }, >>>>>>> … } >>>>>>> >>>>>>> And finally, it could be replaced by: TsFileMetaData ---> [ >>>>>>> {deviceId(d0), >>>>>>> 0 }, {deviceId(d1), 3 }, … } >>>>>>> >>>>>>> Thanks, >>>>>>> ————————————————— >>>>>>> Jialin Qiao >>>>>>> School of Software, Tsinghua University >>>>>>> >>>>>>> 乔嘉林 >>>>>>> 清华大学 软件学院 >>>>>>> >>>>>>> >>>>>>> atoiLiu <atoi...@163.com> 于2020年2月11日周二 下午7:59写道: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Thank you for your reply. >>>>>>> I am very happy that you can take my suggestion. >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Dawei Liu >>>>>>> >>>>>>> >>>>>>> 2020年2月11日 下午6:04,Haonan Hou <hhao...@outlook.com> 写道: >>>>>>> >>>>>>> Hi Dawei, >>>>>>> >>>>>>> Thank you so much that you share your opinion about new TsFile! >>>>>>> I am very happy to take your suggestions. >>>>>>> >>>>>>> You said we can remove TsOffsetArray and directly store the offset of >>>>>>> TimeseriesMetaData. I agree with you. It is better than my version. >>>>>>> Besides, for the optimization of TimeserieMetaData, I would like to >>>>>>> discuss with other people to determine which way is better. >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Haonan Hou >>>>>>> >>>>>>> >>>>>>> On Feb 11, 2020, at 5:35 PM, atoiLiu <atoi...@163.com> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I’m learning new TsFile in PR [1], but I think TsFileMetaData has a bad >>>>>>> design. >>>>>>> >>>>>>> TsFileMetaData has a TsOffsetArray, TsOffsetArray is record every >>>>>>> offset of TimeseriesMetaData, and use Map<deviceId, int[]> to record >>>>>>> startIndex , endIndex of TsOffsetArray, it’s looks like : >>>>>>> >>>>>>> TsFileMetaData —>{ [0,1,2,3,4,5, ….] [ {deviceId(d0), [0,2] }, >>>>>>> {deviceId(d1), [3,5] }, …. } } >>>>>>> >>>>>>> We can delete TsOffsetArray and store the offsets directly in the >>>>>>> deviceIndexArray, then TsFileMatadata will has a Map<deviceId, >>>>>>> List<Long>> >>>>>>> to record . This change will save 4 bytes per device on disk, because >>>>>>> every >>>>>>> device just need record the number of offsets and offsets. it’s looks >>>>>>> like: >>>>>>> >>>>>>> TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] >>>>>>> }, … } >>>>>>> >>>>>>> >>>>>>> In addition, TimeSeriesMetaData is an ordered structure on the hard >>>>>>> disk, and the TimeSeriesMetaData for each device is linked together, so >>>>>>> TsFileMetaData does not need to store all offset information, so there >>>>>>> two >>>>>>> optimization directions: >>>>>>> >>>>>>> 1. Save startTime , endTime and offset for each TimeSeriesMetaData in >>>>>>> TsFileMetaData. The nice thing about this is that when you read >>>>>>> TsFileMetaData from your hard drive, you can directly do a filter to >>>>>>> filter >>>>>>> which TimeSeriesMetaData is not necessary to read. >>>>>>> >>>>>>> >>>>>>> 2. Only save the start TimeSeriesMetaData offset in TsFileMetaData so >>>>>>> that you can loop through it and just need once to seek, it’s looks >>>>>>> like : >>>>>>> >>>>>>> TsFileMetaData ---> [ {deviceId(d0), 0 }, {deviceId(d1), 3 }, … } >>>>>>> >>>>>>> >>>>>>> >>>>>>> [1] https://github.com/apache/incubator-iotdb/pull/736 < >>>>>>> https://github.com/apache/incubator-iotdb/pull/736> >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Dawei Liu >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >