Hi Haonan,
I can not see the picture, could you please put it in the PR? Thanks, -- Jialin Qiao School of Software, Tsinghua University 乔嘉林 清华大学 软件学院 -----原始邮件----- 发件人:"Haonan Hou" <[email protected]> 发送时间:2020-02-12 16:27:22 (星期三) 收件人: "[email protected]" <[email protected]> 抄送: 主题: Re: Suggestions for new TsFile Hi, We have a newer design of TsFile, which combines the suggestions from Jialin and Dawei. The mean differences is as below: 1. Remove TsOffsetArray. 2. Modify the device map in TsFileMetaData to store the start offset of first TimeseriesMetadata and total data size of all TimeseriesMetadatas in each device. The newer TsFile structure should be looked like: Here is an example of how the new structure works. When we try to get List<ChunkMetadata> of Timeseries "d0.s1", first we deserialize the map in TsFileMetadata, and we have the startOffset of TimseriesMetadata “s0", the first TimeseiresMetadata of “d0", and data size of all TimeseriesMetadatas in “d0". After that, we are able to deserialize all TimeseriesMetadata in “d0”. Finally we have the TimeseriesMetadata "d0.s1" and can get the ChunkMetadata List of "d0.s1". Thanks, Haonan Hou On Feb 11, 2020, at 8:08 PM, Jialin Qiao <[email protected]> wrote: Hi, If each device only stores each offset of TimeseriesMetadata like this: TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] }, … } It could be simplified to recording the start offset and end offset: TsFileMetaData ---> [ {deviceId(d0), [0, 2] }, {deviceId(d1), [3,5] }, … } And finally, it could be replaced by: TsFileMetaData ---> [ {deviceId(d0), 0 }, {deviceId(d1), 3 }, … } Thanks, ————————————————— Jialin Qiao School of Software, Tsinghua University 乔嘉林 清华大学 软件学院 atoiLiu <[email protected]> 于2020年2月11日周二 下午7:59写道: Hi, Thank you for your reply. I am very happy that you can take my suggestion. Thanks Dawei Liu 2020年2月11日 下午6:04,Haonan Hou <[email protected]> 写道: Hi Dawei, Thank you so much that you share your opinion about new TsFile! I am very happy to take your suggestions. You said we can remove TsOffsetArray and directly store the offset of TimeseriesMetaData. I agree with you. It is better than my version. Besides, for the optimization of TimeserieMetaData, I would like to discuss with other people to determine which way is better. Best, Haonan Hou On Feb 11, 2020, at 5:35 PM, atoiLiu <[email protected]> wrote: Hi, I’m learning new TsFile in PR [1], but I think TsFileMetaData has a bad design. TsFileMetaData has a TsOffsetArray, TsOffsetArray is record every offset of TimeseriesMetaData, and use Map<deviceId, int[]> to record startIndex , endIndex of TsOffsetArray, it’s looks like : TsFileMetaData —>{ [0,1,2,3,4,5, ….] [ {deviceId(d0), [0,2] }, {deviceId(d1), [3,5] }, …. } } We can delete TsOffsetArray and store the offsets directly in the deviceIndexArray, then TsFileMatadata will has a Map<deviceId, List<Long>> to record . This change will save 4 bytes per device on disk, because every device just need record the number of offsets and offsets. it’s looks like: TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] }, … } In addition, TimeSeriesMetaData is an ordered structure on the hard disk, and the TimeSeriesMetaData for each device is linked together, so TsFileMetaData does not need to store all offset information, so there two optimization directions: 1. Save startTime , endTime and offset for each TimeSeriesMetaData in TsFileMetaData. The nice thing about this is that when you read TsFileMetaData from your hard drive, you can directly do a filter to filter which TimeSeriesMetaData is not necessary to read. 2. Only save the start TimeSeriesMetaData offset in TsFileMetaData so that you can loop through it and just need once to seek, it’s looks like : TsFileMetaData ---> [ {deviceId(d0), 0 }, {deviceId(d1), 3 }, … } [1] https://github.com/apache/incubator-iotdb/pull/736 < https://github.com/apache/incubator-iotdb/pull/736> Thanks Dawei Liu
