Sure, already added. Thanks,
Haonan Hou > On Feb 12, 2020, at 5:57 PM, Jialin Qiao <[email protected]> wrote: > > Hi Haonan, > > > I can not see the picture, could you please put it in the PR? > > Thanks, > -- > Jialin Qiao > School of Software, Tsinghua University > > 乔嘉林 > 清华大学 软件学院 > > -----原始邮件----- > 发件人:"Haonan Hou" <[email protected]> > 发送时间:2020-02-12 16:27:22 (星期三) > 收件人: "[email protected]" <[email protected]> > 抄送: > 主题: Re: Suggestions for new TsFile > > Hi, > > > We have a newer design of TsFile, which combines the suggestions from Jialin > and Dawei. > > > The mean differences is as below: > > > 1. Remove TsOffsetArray. > 2. Modify the device map in TsFileMetaData to store the start offset of first > TimeseriesMetadata and total data size of all TimeseriesMetadatas in each > device. > > > > > The newer TsFile structure should be looked like: > > Here is an example of how the new structure works. > > > When we try to get List<ChunkMetadata> of Timeseries "d0.s1", first we > deserialize the map in TsFileMetadata, and we have the startOffset of > TimseriesMetadata “s0", > the first TimeseiresMetadata of “d0", and data size of all > TimeseriesMetadatas in “d0". > > > After that, we are able to deserialize all TimeseriesMetadata in “d0”. > > > Finally we have the TimeseriesMetadata "d0.s1" and can get the ChunkMetadata > List of "d0.s1". > > > Thanks, > > > Haonan Hou > > > > > On Feb 11, 2020, at 8:08 PM, Jialin Qiao <[email protected]> wrote: > > > Hi, > > If each device only stores each offset of TimeseriesMetadata like this: > TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] }, … > } > > It could be simplified to recording the start offset and end offset: > TsFileMetaData ---> [ {deviceId(d0), [0, 2] }, {deviceId(d1), [3,5] }, … } > > And finally, it could be replaced by: TsFileMetaData ---> [ {deviceId(d0), > 0 }, {deviceId(d1), 3 }, … } > > Thanks, > ————————————————— > Jialin Qiao > School of Software, Tsinghua University > > 乔嘉林 > 清华大学 软件学院 > > > atoiLiu <[email protected]> 于2020年2月11日周二 下午7:59写道: > > Hi, > > Thank you for your reply. > I am very happy that you can take my suggestion. > > > Thanks > > Dawei Liu > > > 2020年2月11日 下午6:04,Haonan Hou <[email protected]> 写道: > > Hi Dawei, > > Thank you so much that you share your opinion about new TsFile! > I am very happy to take your suggestions. > > You said we can remove TsOffsetArray and directly store the offset of > TimeseriesMetaData. I agree with you. It is better than my version. > Besides, for the optimization of TimeserieMetaData, I would like to > discuss with other people to determine which way is better. > > Best, > > Haonan Hou > > > On Feb 11, 2020, at 5:35 PM, atoiLiu <[email protected]> wrote: > > Hi, > > I’m learning new TsFile in PR [1], but I think TsFileMetaData has a bad > design. > > TsFileMetaData has a TsOffsetArray, TsOffsetArray is record every > offset of TimeseriesMetaData, and use Map<deviceId, int[]> to record > startIndex , endIndex of TsOffsetArray, it’s looks like : > > TsFileMetaData —>{ [0,1,2,3,4,5, ….] [ {deviceId(d0), [0,2] }, > {deviceId(d1), [3,5] }, …. } } > > We can delete TsOffsetArray and store the offsets directly in the > deviceIndexArray, then TsFileMatadata will has a Map<deviceId, List<Long>> > to record . This change will save 4 bytes per device on disk, because every > device just need record the number of offsets and offsets. it’s looks like: > > TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] > }, … } > > > In addition, TimeSeriesMetaData is an ordered structure on the hard > disk, and the TimeSeriesMetaData for each device is linked together, so > TsFileMetaData does not need to store all offset information, so there two > optimization directions: > > 1. Save startTime , endTime and offset for each TimeSeriesMetaData in > TsFileMetaData. The nice thing about this is that when you read > TsFileMetaData from your hard drive, you can directly do a filter to filter > which TimeSeriesMetaData is not necessary to read. > > > 2. Only save the start TimeSeriesMetaData offset in TsFileMetaData so > that you can loop through it and just need once to seek, it’s looks like : > > TsFileMetaData ---> [ {deviceId(d0), 0 }, {deviceId(d1), 3 }, … } > > > > [1] https://github.com/apache/incubator-iotdb/pull/736 < > https://github.com/apache/incubator-iotdb/pull/736> > > Thanks > > Dawei Liu > > > > >
