Hi,

Thank you for your reply. 
I am very happy that you can take my suggestion.


Thanks

Dawei Liu


> 2020年2月11日 下午6:04,Haonan Hou <[email protected]> 写道:
> 
> Hi Dawei,
> 
> Thank you so much that you share your opinion about new TsFile! 
> I am very happy to take your suggestions.
> 
> You said we can remove TsOffsetArray and directly store the offset of 
> TimeseriesMetaData. I agree with you. It is better than my version. 
> Besides, for the optimization of TimeserieMetaData, I would like to discuss 
> with other people to determine which way is better.
> 
> Best,
> 
> Haonan Hou
> 
> 
>> On Feb 11, 2020, at 5:35 PM, atoiLiu <[email protected]> wrote:
>> 
>> Hi,
>> 
>> I’m learning new TsFile in PR [1], but I think TsFileMetaData has a bad 
>> design.
>> 
>> TsFileMetaData has a TsOffsetArray,  TsOffsetArray is record every offset of 
>> TimeseriesMetaData, and use Map<deviceId, int[]> to record startIndex , 
>> endIndex of TsOffsetArray, it’s looks like :
>> 
>> TsFileMetaData —>{ [0,1,2,3,4,5, ….] [ {deviceId(d0), [0,2] }, 
>> {deviceId(d1), [3,5] }, …. } }
>> 
>> We can delete TsOffsetArray  and store the offsets directly in the 
>> deviceIndexArray, then TsFileMatadata will has a Map<deviceId, List<Long>> 
>> to record . This change will save 4 bytes per device on disk, because every 
>> device just need record the number of offsets and offsets. it’s looks like:
>> 
>> TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] }, … }
>> 
>> 
>> In addition, TimeSeriesMetaData is an ordered structure on the hard disk, 
>> and the TimeSeriesMetaData for each device is linked together, so 
>> TsFileMetaData does not need to store all offset information, so there two 
>> optimization directions:
>> 
>> 1. Save startTime , endTime and offset for each TimeSeriesMetaData in 
>> TsFileMetaData. The nice thing about this is that when you read 
>> TsFileMetaData from your hard drive, you can directly do a filter to filter 
>> which TimeSeriesMetaData is not necessary to read.
>> 
>> 
>> 2. Only save the start TimeSeriesMetaData offset in TsFileMetaData so that 
>> you can loop through it and just need once to seek, it’s looks like :
>> 
>> TsFileMetaData ---> [ {deviceId(d0), 0 }, {deviceId(d1), 3 }, … }
>> 
>> 
>> 
>> [1] https://github.com/apache/incubator-iotdb/pull/736 
>> <https://github.com/apache/incubator-iotdb/pull/736>
>> 
>> Thanks
>> 
>> Dawei Liu
> 

Reply via email to