Hi, I found another problem, when I execute : ` SELECT s1 FROM xx WHERE time = 1`
In the new TsFile, need to read the hard drive 3 times, 1. Read TsFileMetaData 2. Read the MetaData of all measurement of the device ( TimeSeriesMetaData ) 3. Read the required measurement of the ChunkMetaData and then the time filter( time = 1 ) can be filter which Chunk can be used In the current server, most of the time is used for TimeFilter, we read a lot of metadata information, if the end can not be used, this is a very big loss. So I think we should add a time attribute so that we can know if the file is can’t to use when we first read the hard drive Thanks Dawei Liu > 2020年2月12日 下午8:03,Dawei Liu <[email protected]> 写道: > > Hi, > > I see it. It looks very comfortable. > > Thanks > > Dawei Liu > >> 2020年2月12日 下午6:52,Haonan Hou <[email protected]> 写道: >> >> Sure, already added. >> >> Thanks, >> >> Haonan Hou >> >>> On Feb 12, 2020, at 5:57 PM, Jialin Qiao <[email protected]> >>> wrote: >>> >>> Hi Haonan, >>> >>> >>> I can not see the picture, could you please put it in the PR? >>> >>> Thanks, >>> -- >>> Jialin Qiao >>> School of Software, Tsinghua University >>> >>> 乔嘉林 >>> 清华大学 软件学院 >>> >>> -----原始邮件----- >>> 发件人:"Haonan Hou" <[email protected]> >>> 发送时间:2020-02-12 16:27:22 (星期三) >>> 收件人: "[email protected]" <[email protected]> >>> 抄送: >>> 主题: Re: Suggestions for new TsFile >>> >>> Hi, >>> >>> >>> We have a newer design of TsFile, which combines the suggestions from >>> Jialin and Dawei. >>> >>> >>> The mean differences is as below: >>> >>> >>> 1. Remove TsOffsetArray. >>> 2. Modify the device map in TsFileMetaData to store the start offset of >>> first TimeseriesMetadata and total data size of all TimeseriesMetadatas in >>> each device. >>> >>> >>> >>> >>> The newer TsFile structure should be looked like: >>> >>> Here is an example of how the new structure works. >>> >>> >>> When we try to get List<ChunkMetadata> of Timeseries "d0.s1", first we >>> deserialize the map in TsFileMetadata, and we have the startOffset of >>> TimseriesMetadata “s0", >>> the first TimeseiresMetadata of “d0", and data size of all >>> TimeseriesMetadatas in “d0". >>> >>> >>> After that, we are able to deserialize all TimeseriesMetadata in “d0”. >>> >>> >>> Finally we have the TimeseriesMetadata "d0.s1" and can get the >>> ChunkMetadata List of "d0.s1". >>> >>> >>> Thanks, >>> >>> >>> Haonan Hou >>> >>> >>> >>> >>> On Feb 11, 2020, at 8:08 PM, Jialin Qiao <[email protected]> wrote: >>> >>> >>> Hi, >>> >>> If each device only stores each offset of TimeseriesMetadata like this: >>> TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] }, … >>> } >>> >>> It could be simplified to recording the start offset and end offset: >>> TsFileMetaData ---> [ {deviceId(d0), [0, 2] }, {deviceId(d1), [3,5] }, … } >>> >>> And finally, it could be replaced by: TsFileMetaData ---> [ {deviceId(d0), >>> 0 }, {deviceId(d1), 3 }, … } >>> >>> Thanks, >>> ————————————————— >>> Jialin Qiao >>> School of Software, Tsinghua University >>> >>> 乔嘉林 >>> 清华大学 软件学院 >>> >>> >>> atoiLiu <[email protected]> 于2020年2月11日周二 下午7:59写道: >>> >>> Hi, >>> >>> Thank you for your reply. >>> I am very happy that you can take my suggestion. >>> >>> >>> Thanks >>> >>> Dawei Liu >>> >>> >>> 2020年2月11日 下午6:04,Haonan Hou <[email protected]> 写道: >>> >>> Hi Dawei, >>> >>> Thank you so much that you share your opinion about new TsFile! >>> I am very happy to take your suggestions. >>> >>> You said we can remove TsOffsetArray and directly store the offset of >>> TimeseriesMetaData. I agree with you. It is better than my version. >>> Besides, for the optimization of TimeserieMetaData, I would like to >>> discuss with other people to determine which way is better. >>> >>> Best, >>> >>> Haonan Hou >>> >>> >>> On Feb 11, 2020, at 5:35 PM, atoiLiu <[email protected]> wrote: >>> >>> Hi, >>> >>> I’m learning new TsFile in PR [1], but I think TsFileMetaData has a bad >>> design. >>> >>> TsFileMetaData has a TsOffsetArray, TsOffsetArray is record every >>> offset of TimeseriesMetaData, and use Map<deviceId, int[]> to record >>> startIndex , endIndex of TsOffsetArray, it’s looks like : >>> >>> TsFileMetaData —>{ [0,1,2,3,4,5, ….] [ {deviceId(d0), [0,2] }, >>> {deviceId(d1), [3,5] }, …. } } >>> >>> We can delete TsOffsetArray and store the offsets directly in the >>> deviceIndexArray, then TsFileMatadata will has a Map<deviceId, List<Long>> >>> to record . This change will save 4 bytes per device on disk, because every >>> device just need record the number of offsets and offsets. it’s looks like: >>> >>> TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] >>> }, … } >>> >>> >>> In addition, TimeSeriesMetaData is an ordered structure on the hard >>> disk, and the TimeSeriesMetaData for each device is linked together, so >>> TsFileMetaData does not need to store all offset information, so there two >>> optimization directions: >>> >>> 1. Save startTime , endTime and offset for each TimeSeriesMetaData in >>> TsFileMetaData. The nice thing about this is that when you read >>> TsFileMetaData from your hard drive, you can directly do a filter to filter >>> which TimeSeriesMetaData is not necessary to read. >>> >>> >>> 2. Only save the start TimeSeriesMetaData offset in TsFileMetaData so >>> that you can loop through it and just need once to seek, it’s looks like : >>> >>> TsFileMetaData ---> [ {deviceId(d0), 0 }, {deviceId(d1), 3 }, … } >>> >>> >>> >>> [1] https://github.com/apache/incubator-iotdb/pull/736 < >>> https://github.com/apache/incubator-iotdb/pull/736> >>> >>> Thanks >>> >>> Dawei Liu >>> >>> >>> >>> >>> >>
