Hi, I see it. It looks very comfortable.
Thanks Dawei Liu > 2020年2月12日 下午6:52,Haonan Hou <[email protected]> 写道: > > Sure, already added. > > Thanks, > > Haonan Hou > >> On Feb 12, 2020, at 5:57 PM, Jialin Qiao <[email protected]> wrote: >> >> Hi Haonan, >> >> >> I can not see the picture, could you please put it in the PR? >> >> Thanks, >> -- >> Jialin Qiao >> School of Software, Tsinghua University >> >> 乔嘉林 >> 清华大学 软件学院 >> >> -----原始邮件----- >> 发件人:"Haonan Hou" <[email protected]> >> 发送时间:2020-02-12 16:27:22 (星期三) >> 收件人: "[email protected]" <[email protected]> >> 抄送: >> 主题: Re: Suggestions for new TsFile >> >> Hi, >> >> >> We have a newer design of TsFile, which combines the suggestions from Jialin >> and Dawei. >> >> >> The mean differences is as below: >> >> >> 1. Remove TsOffsetArray. >> 2. Modify the device map in TsFileMetaData to store the start offset of >> first TimeseriesMetadata and total data size of all TimeseriesMetadatas in >> each device. >> >> >> >> >> The newer TsFile structure should be looked like: >> >> Here is an example of how the new structure works. >> >> >> When we try to get List<ChunkMetadata> of Timeseries "d0.s1", first we >> deserialize the map in TsFileMetadata, and we have the startOffset of >> TimseriesMetadata “s0", >> the first TimeseiresMetadata of “d0", and data size of all >> TimeseriesMetadatas in “d0". >> >> >> After that, we are able to deserialize all TimeseriesMetadata in “d0”. >> >> >> Finally we have the TimeseriesMetadata "d0.s1" and can get the ChunkMetadata >> List of "d0.s1". >> >> >> Thanks, >> >> >> Haonan Hou >> >> >> >> >> On Feb 11, 2020, at 8:08 PM, Jialin Qiao <[email protected]> wrote: >> >> >> Hi, >> >> If each device only stores each offset of TimeseriesMetadata like this: >> TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] }, … >> } >> >> It could be simplified to recording the start offset and end offset: >> TsFileMetaData ---> [ {deviceId(d0), [0, 2] }, {deviceId(d1), [3,5] }, … } >> >> And finally, it could be replaced by: TsFileMetaData ---> [ {deviceId(d0), >> 0 }, {deviceId(d1), 3 }, … } >> >> Thanks, >> ————————————————— >> Jialin Qiao >> School of Software, Tsinghua University >> >> 乔嘉林 >> 清华大学 软件学院 >> >> >> atoiLiu <[email protected]> 于2020年2月11日周二 下午7:59写道: >> >> Hi, >> >> Thank you for your reply. >> I am very happy that you can take my suggestion. >> >> >> Thanks >> >> Dawei Liu >> >> >> 2020年2月11日 下午6:04,Haonan Hou <[email protected]> 写道: >> >> Hi Dawei, >> >> Thank you so much that you share your opinion about new TsFile! >> I am very happy to take your suggestions. >> >> You said we can remove TsOffsetArray and directly store the offset of >> TimeseriesMetaData. I agree with you. It is better than my version. >> Besides, for the optimization of TimeserieMetaData, I would like to >> discuss with other people to determine which way is better. >> >> Best, >> >> Haonan Hou >> >> >> On Feb 11, 2020, at 5:35 PM, atoiLiu <[email protected]> wrote: >> >> Hi, >> >> I’m learning new TsFile in PR [1], but I think TsFileMetaData has a bad >> design. >> >> TsFileMetaData has a TsOffsetArray, TsOffsetArray is record every >> offset of TimeseriesMetaData, and use Map<deviceId, int[]> to record >> startIndex , endIndex of TsOffsetArray, it’s looks like : >> >> TsFileMetaData —>{ [0,1,2,3,4,5, ….] [ {deviceId(d0), [0,2] }, >> {deviceId(d1), [3,5] }, …. } } >> >> We can delete TsOffsetArray and store the offsets directly in the >> deviceIndexArray, then TsFileMatadata will has a Map<deviceId, List<Long>> >> to record . This change will save 4 bytes per device on disk, because every >> device just need record the number of offsets and offsets. it’s looks like: >> >> TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] >> }, … } >> >> >> In addition, TimeSeriesMetaData is an ordered structure on the hard >> disk, and the TimeSeriesMetaData for each device is linked together, so >> TsFileMetaData does not need to store all offset information, so there two >> optimization directions: >> >> 1. Save startTime , endTime and offset for each TimeSeriesMetaData in >> TsFileMetaData. The nice thing about this is that when you read >> TsFileMetaData from your hard drive, you can directly do a filter to filter >> which TimeSeriesMetaData is not necessary to read. >> >> >> 2. Only save the start TimeSeriesMetaData offset in TsFileMetaData so >> that you can loop through it and just need once to seek, it’s looks like : >> >> TsFileMetaData ---> [ {deviceId(d0), 0 }, {deviceId(d1), 3 }, … } >> >> >> >> [1] https://github.com/apache/incubator-iotdb/pull/736 < >> https://github.com/apache/incubator-iotdb/pull/736> >> >> Thanks >> >> Dawei Liu >> >> >> >> >> >
