Hi,

I’m learning new TsFile in PR [1], but I think TsFileMetaData has a bad design.

TsFileMetaData has a TsOffsetArray,  TsOffsetArray is record every offset of 
TimeseriesMetaData, and use Map<deviceId, int[]> to record startIndex , 
endIndex of TsOffsetArray, it’s looks like :

TsFileMetaData —>{ [0,1,2,3,4,5, ….] [ {deviceId(d0), [0,2] }, {deviceId(d1), 
[3,5] }, …. } }

We can delete TsOffsetArray  and store the offsets directly in the 
deviceIndexArray, then TsFileMatadata will has a Map<deviceId, List<Long>> to 
record . This change will save 4 bytes per device on disk, because every device 
just need record the number of offsets and offsets. it’s looks like:

TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] }, … }


In addition, TimeSeriesMetaData is an ordered structure on the hard disk, and 
the TimeSeriesMetaData for each device is linked together, so TsFileMetaData 
does not need to store all offset information, so there two optimization 
directions:

1. Save startTime , endTime and offset for each TimeSeriesMetaData in 
TsFileMetaData. The nice thing about this is that when you read TsFileMetaData 
from your hard drive, you can directly do a filter to filter which 
TimeSeriesMetaData is not necessary to read.


2. Only save the start TimeSeriesMetaData offset in TsFileMetaData so that you 
can loop through it and just need once to seek, it’s looks like :

TsFileMetaData ---> [ {deviceId(d0), 0 }, {deviceId(d1), 3 }, … }



[1] https://github.com/apache/incubator-iotdb/pull/736 
<https://github.com/apache/incubator-iotdb/pull/736>

Thanks

Dawei Liu

Reply via email to