Hi, Sorry,i overlooked that the first step in server was to filter files through startTime/endTime Map.
+1 for add a Statistics in TimeseriesMetadata, For example: Device Shadow (设备影子) , it is often necessary to find the last information about a device Thanks Dawei Liu > 2020年2月13日 下午4:58,Jialin Qiao <[email protected]> 写道: > > Hi, > > I have a suggestion. > We could add a Statistics in TimeseriesMetadata to support fast aggregations. > > Thanks, > -- > Jialin Qiao > School of Software, Tsinghua University > > 乔嘉林 > 清华大学 软件学院 > >> -----原始邮件----- >> 发件人: "Jialin Qiao" <[email protected]> >> 发送时间: 2020-02-13 16:16:54 (星期四) >> 收件人: [email protected] >> 抄送: >> 主题: Re: Suggestions for new TsFile >> >> Hi, >> >> +1 for most queries contains a time filter >> >> But I don't know what do you mean by "add a time attribute", add to where? >> >> Thanks, >> -- >> Jialin Qiao >> School of Software, Tsinghua University >> >> 乔嘉林 >> 清华大学 软件学院 >> >>> -----原始邮件----- >>> 发件人: "Dawei Liu" <[email protected]> >>> 发送时间: 2020-02-13 15:55:48 (星期四) >>> 收件人: [email protected] >>> 抄送: >>> 主题: Re: Suggestions for new TsFile >>> >>> Hi, >>> >>> I found another problem, when I execute : ` SELECT s1 FROM xx WHERE time >>> = 1` >>> >>> In the new TsFile, need to read the hard drive 3 times, >>> >>> 1. Read TsFileMetaData >>> >>> 2. Read the MetaData of all measurement of the device ( TimeSeriesMetaData ) >>> >>> 3. Read the required measurement of the ChunkMetaData and then the time >>> filter( time = 1 ) can be filter which Chunk can be used >>> >>> >>> >>> In the current server, most of the time is used for TimeFilter, we read a >>> lot of metadata information, if the end can not be used, this is a very big >>> loss. >>> >>> So I think we should add a time attribute so that we can know if the file >>> is can’t to use when we first read the hard drive >>> >>> Thanks >>> >>> Dawei Liu >>> >>> >>>> 2020年2月12日 下午8:03,Dawei Liu <[email protected]> 写道: >>>> >>>> Hi, >>>> >>>> I see it. It looks very comfortable. >>>> >>>> Thanks >>>> >>>> Dawei Liu >>>> >>>>> 2020年2月12日 下午6:52,Haonan Hou <[email protected]> 写道: >>>>> >>>>> Sure, already added. >>>>> >>>>> Thanks, >>>>> >>>>> Haonan Hou >>>>> >>>>>> On Feb 12, 2020, at 5:57 PM, Jialin Qiao <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Hi Haonan, >>>>>> >>>>>> >>>>>> I can not see the picture, could you please put it in the PR? >>>>>> >>>>>> Thanks, >>>>>> -- >>>>>> Jialin Qiao >>>>>> School of Software, Tsinghua University >>>>>> >>>>>> 乔嘉林 >>>>>> 清华大学 软件学院 >>>>>> >>>>>> -----原始邮件----- >>>>>> 发件人:"Haonan Hou" <[email protected]> >>>>>> 发送时间:2020-02-12 16:27:22 (星期三) >>>>>> 收件人: "[email protected]" <[email protected]> >>>>>> 抄送: >>>>>> 主题: Re: Suggestions for new TsFile >>>>>> >>>>>> Hi, >>>>>> >>>>>> >>>>>> We have a newer design of TsFile, which combines the suggestions from >>>>>> Jialin and Dawei. >>>>>> >>>>>> >>>>>> The mean differences is as below: >>>>>> >>>>>> >>>>>> 1. Remove TsOffsetArray. >>>>>> 2. Modify the device map in TsFileMetaData to store the start offset of >>>>>> first TimeseriesMetadata and total data size of all TimeseriesMetadatas >>>>>> in each device. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> The newer TsFile structure should be looked like: >>>>>> >>>>>> Here is an example of how the new structure works. >>>>>> >>>>>> >>>>>> When we try to get List<ChunkMetadata> of Timeseries "d0.s1", first we >>>>>> deserialize the map in TsFileMetadata, and we have the startOffset of >>>>>> TimseriesMetadata “s0", >>>>>> the first TimeseiresMetadata of “d0", and data size of all >>>>>> TimeseriesMetadatas in “d0". >>>>>> >>>>>> >>>>>> After that, we are able to deserialize all TimeseriesMetadata in “d0”. >>>>>> >>>>>> >>>>>> Finally we have the TimeseriesMetadata "d0.s1" and can get the >>>>>> ChunkMetadata List of "d0.s1". >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> >>>>>> Haonan Hou >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Feb 11, 2020, at 8:08 PM, Jialin Qiao <[email protected]> wrote: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> If each device only stores each offset of TimeseriesMetadata like this: >>>>>> TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] >>>>>> }, … >>>>>> } >>>>>> >>>>>> It could be simplified to recording the start offset and end offset: >>>>>> TsFileMetaData ---> [ {deviceId(d0), [0, 2] }, {deviceId(d1), [3,5] }, … >>>>>> } >>>>>> >>>>>> And finally, it could be replaced by: TsFileMetaData ---> [ >>>>>> {deviceId(d0), >>>>>> 0 }, {deviceId(d1), 3 }, … } >>>>>> >>>>>> Thanks, >>>>>> ————————————————— >>>>>> Jialin Qiao >>>>>> School of Software, Tsinghua University >>>>>> >>>>>> 乔嘉林 >>>>>> 清华大学 软件学院 >>>>>> >>>>>> >>>>>> atoiLiu <[email protected]> 于2020年2月11日周二 下午7:59写道: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Thank you for your reply. >>>>>> I am very happy that you can take my suggestion. >>>>>> >>>>>> >>>>>> Thanks >>>>>> >>>>>> Dawei Liu >>>>>> >>>>>> >>>>>> 2020年2月11日 下午6:04,Haonan Hou <[email protected]> 写道: >>>>>> >>>>>> Hi Dawei, >>>>>> >>>>>> Thank you so much that you share your opinion about new TsFile! >>>>>> I am very happy to take your suggestions. >>>>>> >>>>>> You said we can remove TsOffsetArray and directly store the offset of >>>>>> TimeseriesMetaData. I agree with you. It is better than my version. >>>>>> Besides, for the optimization of TimeserieMetaData, I would like to >>>>>> discuss with other people to determine which way is better. >>>>>> >>>>>> Best, >>>>>> >>>>>> Haonan Hou >>>>>> >>>>>> >>>>>> On Feb 11, 2020, at 5:35 PM, atoiLiu <[email protected]> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I’m learning new TsFile in PR [1], but I think TsFileMetaData has a bad >>>>>> design. >>>>>> >>>>>> TsFileMetaData has a TsOffsetArray, TsOffsetArray is record every >>>>>> offset of TimeseriesMetaData, and use Map<deviceId, int[]> to record >>>>>> startIndex , endIndex of TsOffsetArray, it’s looks like : >>>>>> >>>>>> TsFileMetaData —>{ [0,1,2,3,4,5, ….] [ {deviceId(d0), [0,2] }, >>>>>> {deviceId(d1), [3,5] }, …. } } >>>>>> >>>>>> We can delete TsOffsetArray and store the offsets directly in the >>>>>> deviceIndexArray, then TsFileMatadata will has a Map<deviceId, >>>>>> List<Long>> >>>>>> to record . This change will save 4 bytes per device on disk, because >>>>>> every >>>>>> device just need record the number of offsets and offsets. it’s looks >>>>>> like: >>>>>> >>>>>> TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] >>>>>> }, … } >>>>>> >>>>>> >>>>>> In addition, TimeSeriesMetaData is an ordered structure on the hard >>>>>> disk, and the TimeSeriesMetaData for each device is linked together, so >>>>>> TsFileMetaData does not need to store all offset information, so there >>>>>> two >>>>>> optimization directions: >>>>>> >>>>>> 1. Save startTime , endTime and offset for each TimeSeriesMetaData in >>>>>> TsFileMetaData. The nice thing about this is that when you read >>>>>> TsFileMetaData from your hard drive, you can directly do a filter to >>>>>> filter >>>>>> which TimeSeriesMetaData is not necessary to read. >>>>>> >>>>>> >>>>>> 2. Only save the start TimeSeriesMetaData offset in TsFileMetaData so >>>>>> that you can loop through it and just need once to seek, it’s looks like >>>>>> : >>>>>> >>>>>> TsFileMetaData ---> [ {deviceId(d0), 0 }, {deviceId(d1), 3 }, … } >>>>>> >>>>>> >>>>>> >>>>>> [1] https://github.com/apache/incubator-iotdb/pull/736 < >>>>>> https://github.com/apache/incubator-iotdb/pull/736> >>>>>> >>>>>> Thanks >>>>>> >>>>>> Dawei Liu >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>
