Hi,

We have a newer design of TsFile, which combines the suggestions from Jialin 
and Dawei.

The mean differences is as below:

1. Remove TsOffsetArray.
2. Modify the device map in TsFileMetaData to store the start offset of first 
TimeseriesMetadata and total data size of all TimeseriesMetadatas in each 
device.


The newer TsFile structure should be looked like:
[cid:6CECB108-BC0C-4EA1-B9D5-FC8A4885D520]

Here is an example of how the new structure works.

When we try to get List<ChunkMetadata> of Timeseries "d0.s1", first we 
deserialize the map in TsFileMetadata, and we have the startOffset of 
TimseriesMetadata “s0",
the first TimeseiresMetadata of “d0", and data size of all TimeseriesMetadatas 
in “d0".

After that, we are able to deserialize all TimeseriesMetadata in “d0”.

Finally we have the TimeseriesMetadata "d0.s1" and can get the ChunkMetadata 
List of "d0.s1".

Thanks,

Haonan Hou


On Feb 11, 2020, at 8:08 PM, Jialin Qiao 
<[email protected]<mailto:[email protected]>> wrote:

Hi,

If each device only stores each offset of TimeseriesMetadata like this:
TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] }, …
}

It could be simplified to recording the start offset and end offset:
TsFileMetaData ---> [ {deviceId(d0), [0, 2] }, {deviceId(d1), [3,5] }, … }

And finally, it could be replaced by: TsFileMetaData ---> [ {deviceId(d0),
0 }, {deviceId(d1), 3 }, … }

Thanks,
—————————————————
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院


atoiLiu <[email protected]<mailto:[email protected]>> 于2020年2月11日周二 下午7:59写道:

Hi,

Thank you for your reply.
I am very happy that you can take my suggestion.


Thanks

Dawei Liu


2020年2月11日 下午6:04,Haonan Hou <[email protected]<mailto:[email protected]>> 
写道:

Hi Dawei,

Thank you so much that you share your opinion about new TsFile!
I am very happy to take your suggestions.

You said we can remove TsOffsetArray and directly store the offset of
TimeseriesMetaData. I agree with you. It is better than my version.
Besides, for the optimization of TimeserieMetaData, I would like to
discuss with other people to determine which way is better.

Best,

Haonan Hou


On Feb 11, 2020, at 5:35 PM, atoiLiu <[email protected]<mailto:[email protected]>> 
wrote:

Hi,

I’m learning new TsFile in PR [1], but I think TsFileMetaData has a bad
design.

TsFileMetaData has a TsOffsetArray,  TsOffsetArray is record every
offset of TimeseriesMetaData, and use Map<deviceId, int[]> to record
startIndex , endIndex of TsOffsetArray, it’s looks like :

TsFileMetaData —>{ [0,1,2,3,4,5, ….] [ {deviceId(d0), [0,2] },
{deviceId(d1), [3,5] }, …. } }

We can delete TsOffsetArray  and store the offsets directly in the
deviceIndexArray, then TsFileMatadata will has a Map<deviceId, List<Long>>
to record . This change will save 4 bytes per device on disk, because every
device just need record the number of offsets and offsets. it’s looks like:

TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5]
}, … }


In addition, TimeSeriesMetaData is an ordered structure on the hard
disk, and the TimeSeriesMetaData for each device is linked together, so
TsFileMetaData does not need to store all offset information, so there two
optimization directions:

1. Save startTime , endTime and offset for each TimeSeriesMetaData in
TsFileMetaData. The nice thing about this is that when you read
TsFileMetaData from your hard drive, you can directly do a filter to filter
which TimeSeriesMetaData is not necessary to read.


2. Only save the start TimeSeriesMetaData offset in TsFileMetaData so
that you can loop through it and just need once to seek, it’s looks like :

TsFileMetaData ---> [ {deviceId(d0), 0 }, {deviceId(d1), 3 }, … }



[1] https://github.com/apache/incubator-iotdb/pull/736 <
https://github.com/apache/incubator-iotdb/pull/736>

Thanks

Dawei Liu




Reply via email to