Hi,

If each device only stores each offset of TimeseriesMetadata like this:
TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] }, …
}

It could be simplified to recording the start offset and end offset:
TsFileMetaData ---> [ {deviceId(d0), [0, 2] }, {deviceId(d1), [3,5] }, … }

And finally, it could be replaced by: TsFileMetaData ---> [ {deviceId(d0),
0 }, {deviceId(d1), 3 }, … }

Thanks,
—————————————————
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院


atoiLiu <[email protected]> 于2020年2月11日周二 下午7:59写道:

> Hi,
>
> Thank you for your reply.
> I am very happy that you can take my suggestion.
>
>
> Thanks
>
> Dawei Liu
>
>
> > 2020年2月11日 下午6:04,Haonan Hou <[email protected]> 写道:
> >
> > Hi Dawei,
> >
> > Thank you so much that you share your opinion about new TsFile!
> > I am very happy to take your suggestions.
> >
> > You said we can remove TsOffsetArray and directly store the offset of
> TimeseriesMetaData. I agree with you. It is better than my version.
> > Besides, for the optimization of TimeserieMetaData, I would like to
> discuss with other people to determine which way is better.
> >
> > Best,
> >
> > Haonan Hou
> >
> >
> >> On Feb 11, 2020, at 5:35 PM, atoiLiu <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >> I’m learning new TsFile in PR [1], but I think TsFileMetaData has a bad
> design.
> >>
> >> TsFileMetaData has a TsOffsetArray,  TsOffsetArray is record every
> offset of TimeseriesMetaData, and use Map<deviceId, int[]> to record
> startIndex , endIndex of TsOffsetArray, it’s looks like :
> >>
> >> TsFileMetaData —>{ [0,1,2,3,4,5, ….] [ {deviceId(d0), [0,2] },
> {deviceId(d1), [3,5] }, …. } }
> >>
> >> We can delete TsOffsetArray  and store the offsets directly in the
> deviceIndexArray, then TsFileMatadata will has a Map<deviceId, List<Long>>
> to record . This change will save 4 bytes per device on disk, because every
> device just need record the number of offsets and offsets. it’s looks like:
> >>
> >> TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5]
> }, … }
> >>
> >>
> >> In addition, TimeSeriesMetaData is an ordered structure on the hard
> disk, and the TimeSeriesMetaData for each device is linked together, so
> TsFileMetaData does not need to store all offset information, so there two
> optimization directions:
> >>
> >> 1. Save startTime , endTime and offset for each TimeSeriesMetaData in
> TsFileMetaData. The nice thing about this is that when you read
> TsFileMetaData from your hard drive, you can directly do a filter to filter
> which TimeSeriesMetaData is not necessary to read.
> >>
> >>
> >> 2. Only save the start TimeSeriesMetaData offset in TsFileMetaData so
> that you can loop through it and just need once to seek, it’s looks like :
> >>
> >> TsFileMetaData ---> [ {deviceId(d0), 0 }, {deviceId(d1), 3 }, … }
> >>
> >>
> >>
> >> [1] https://github.com/apache/incubator-iotdb/pull/736 <
> https://github.com/apache/incubator-iotdb/pull/736>
> >>
> >> Thanks
> >>
> >> Dawei Liu
> >
>
>

Reply via email to