Hi,

I see it. It looks very comfortable.

Thanks

Dawei Liu

> 2020年2月12日 下午6:52,Haonan Hou <[email protected]> 写道:
> 
> Sure, already added.
> 
> Thanks,
> 
> Haonan Hou
> 
>> On Feb 12, 2020, at 5:57 PM, Jialin Qiao <[email protected]> wrote:
>> 
>> Hi Haonan,
>> 
>> 
>> I can not see the picture, could you please put it in the PR?
>> 
>> Thanks,
>> --
>> Jialin Qiao
>> School of Software, Tsinghua University
>> 
>> 乔嘉林
>> 清华大学 软件学院
>> 
>> -----原始邮件-----
>> 发件人:"Haonan Hou" <[email protected]>
>> 发送时间:2020-02-12 16:27:22 (星期三)
>> 收件人: "[email protected]" <[email protected]>
>> 抄送:
>> 主题: Re: Suggestions for new TsFile
>> 
>> Hi, 
>> 
>> 
>> We have a newer design of TsFile, which combines the suggestions from Jialin 
>> and Dawei. 
>> 
>> 
>> The mean differences is as below:
>> 
>> 
>> 1. Remove TsOffsetArray.
>> 2. Modify the device map in TsFileMetaData to store the start offset of 
>> first TimeseriesMetadata and total data size of all TimeseriesMetadatas in 
>> each device. 
>> 
>> 
>> 
>> 
>> The newer TsFile structure should be looked like:
>> 
>> Here is an example of how the new structure works.
>> 
>> 
>> When we try to get List<ChunkMetadata> of Timeseries "d0.s1", first we 
>> deserialize the map in TsFileMetadata, and we have the startOffset of 
>> TimseriesMetadata “s0", 
>> the first TimeseiresMetadata of “d0", and data size of all 
>> TimeseriesMetadatas in “d0". 
>> 
>> 
>> After that, we are able to deserialize all TimeseriesMetadata in “d0”. 
>> 
>> 
>> Finally we have the TimeseriesMetadata "d0.s1" and can get the ChunkMetadata 
>> List of "d0.s1".
>> 
>> 
>> Thanks,
>> 
>> 
>> Haonan Hou
>> 
>> 
>> 
>> 
>> On Feb 11, 2020, at 8:08 PM, Jialin Qiao <[email protected]> wrote:
>> 
>> 
>> Hi,
>> 
>> If each device only stores each offset of TimeseriesMetadata like this:
>> TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] }, …
>> }
>> 
>> It could be simplified to recording the start offset and end offset:
>> TsFileMetaData ---> [ {deviceId(d0), [0, 2] }, {deviceId(d1), [3,5] }, … }
>> 
>> And finally, it could be replaced by: TsFileMetaData ---> [ {deviceId(d0),
>> 0 }, {deviceId(d1), 3 }, … }
>> 
>> Thanks,
>> —————————————————
>> Jialin Qiao
>> School of Software, Tsinghua University
>> 
>> 乔嘉林
>> 清华大学 软件学院
>> 
>> 
>> atoiLiu <[email protected]> 于2020年2月11日周二 下午7:59写道:
>> 
>> Hi,
>> 
>> Thank you for your reply.
>> I am very happy that you can take my suggestion.
>> 
>> 
>> Thanks
>> 
>> Dawei Liu
>> 
>> 
>> 2020年2月11日 下午6:04,Haonan Hou <[email protected]> 写道:
>> 
>> Hi Dawei,
>> 
>> Thank you so much that you share your opinion about new TsFile!
>> I am very happy to take your suggestions.
>> 
>> You said we can remove TsOffsetArray and directly store the offset of
>> TimeseriesMetaData. I agree with you. It is better than my version.
>> Besides, for the optimization of TimeserieMetaData, I would like to
>> discuss with other people to determine which way is better.
>> 
>> Best,
>> 
>> Haonan Hou
>> 
>> 
>> On Feb 11, 2020, at 5:35 PM, atoiLiu <[email protected]> wrote:
>> 
>> Hi,
>> 
>> I’m learning new TsFile in PR [1], but I think TsFileMetaData has a bad
>> design.
>> 
>> TsFileMetaData has a TsOffsetArray,  TsOffsetArray is record every
>> offset of TimeseriesMetaData, and use Map<deviceId, int[]> to record
>> startIndex , endIndex of TsOffsetArray, it’s looks like :
>> 
>> TsFileMetaData —>{ [0,1,2,3,4,5, ….] [ {deviceId(d0), [0,2] },
>> {deviceId(d1), [3,5] }, …. } }
>> 
>> We can delete TsOffsetArray  and store the offsets directly in the
>> deviceIndexArray, then TsFileMatadata will has a Map<deviceId, List<Long>>
>> to record . This change will save 4 bytes per device on disk, because every
>> device just need record the number of offsets and offsets. it’s looks like:
>> 
>> TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5]
>> }, … }
>> 
>> 
>> In addition, TimeSeriesMetaData is an ordered structure on the hard
>> disk, and the TimeSeriesMetaData for each device is linked together, so
>> TsFileMetaData does not need to store all offset information, so there two
>> optimization directions:
>> 
>> 1. Save startTime , endTime and offset for each TimeSeriesMetaData in
>> TsFileMetaData. The nice thing about this is that when you read
>> TsFileMetaData from your hard drive, you can directly do a filter to filter
>> which TimeSeriesMetaData is not necessary to read.
>> 
>> 
>> 2. Only save the start TimeSeriesMetaData offset in TsFileMetaData so
>> that you can loop through it and just need once to seek, it’s looks like :
>> 
>> TsFileMetaData ---> [ {deviceId(d0), 0 }, {deviceId(d1), 3 }, … }
>> 
>> 
>> 
>> [1] https://github.com/apache/incubator-iotdb/pull/736 <
>> https://github.com/apache/incubator-iotdb/pull/736>
>> 
>> Thanks
>> 
>> Dawei Liu
>> 
>> 
>> 
>> 
>> 
> 

Reply via email to