Hi,
I’m learning new TsFile in PR [1], but I think TsFileMetaData has a bad design.
TsFileMetaData has a TsOffsetArray, TsOffsetArray is record every offset of
TimeseriesMetaData, and use Map<deviceId, int[]> to record startIndex ,
endIndex of TsOffsetArray, it’s looks like :
TsFileMetaData —>{ [0,1,2,3,4,5, ….] [ {deviceId(d0), [0,2] }, {deviceId(d1),
[3,5] }, …. } }
We can delete TsOffsetArray and store the offsets directly in the
deviceIndexArray, then TsFileMatadata will has a Map<deviceId, List<Long>> to
record . This change will save 4 bytes per device on disk, because every device
just need record the number of offsets and offsets. it’s looks like:
TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] }, … }
In addition, TimeSeriesMetaData is an ordered structure on the hard disk, and
the TimeSeriesMetaData for each device is linked together, so TsFileMetaData
does not need to store all offset information, so there two optimization
directions:
1. Save startTime , endTime and offset for each TimeSeriesMetaData in
TsFileMetaData. The nice thing about this is that when you read TsFileMetaData
from your hard drive, you can directly do a filter to filter which
TimeSeriesMetaData is not necessary to read.
2. Only save the start TimeSeriesMetaData offset in TsFileMetaData so that you
can loop through it and just need once to seek, it’s looks like :
TsFileMetaData ---> [ {deviceId(d0), 0 }, {deviceId(d1), 3 }, … }
[1] https://github.com/apache/incubator-iotdb/pull/736
<https://github.com/apache/incubator-iotdb/pull/736>
Thanks
Dawei Liu