Hi, I'm considering about a tree-structured level index as I describe in this JIRA [1]:
• Each internal node is an array, which is consisted of triple-element items: [<device / measurement name, offset, next>, ... ]. These elements represent the children nodes of the internal node. • By the end of the array, there is an "empty" item with an empty string and the offset equals to the end offset of children elements, so that the length of every element could be easy to calculate. • The field next is an enum value, which represents the type of the next child node. For example, di means the device index, d means the device, mi means the measurement index, and m means the measurement. Only m is leaf node with TimeseriesMetadata. • The largest number of children nodes N could be configured by users. (In the examples [2] , I set N = 10 for convenience) • The storage process is from bottom to up. Whenever the blocks > N, a parent level index will be generated and will be insisted into disk. • The query process is from top to bottom with binary search of the array. I present some examples in the attachment [3] : 5 devices with 5 measurements each; 1 device with 150 measurements; 150 devices with 1 measurement each; 150 devices with 150 measurements each. What do you think about this idea? I'd be very pleased to modify or supplement this proposal if there are any problems : ) [1] https://issues.apache.org/jira/browse/IOTDB-605 <https://issues.apache.org/jira/browse/IOTDB-605> [2] https://issues.apache.org/jira/secure/attachment/13000474/Structure%20of%20MetadataIndex%20levels.png <https://issues.apache.org/jira/secure/attachment/13000474/Structure%20of%20MetadataIndex%20levels.png> [3] https://issues.apache.org/jira/secure/attachment/13000479/Examples.png <https://issues.apache.org/jira/secure/attachment/13000479/Examples.png> Best, ----------------------------------- Zesong Sun School of Software, Tsinghua University 孙泽嵩 清华大学 软件学院 > 2020年4月17日 20:46,Jialin Qiao <[email protected]> 写道: > > Hi, > > > I meet a scenario that one device has 300k measurements. > > > When we read one time series in a TsFile, we need to deserialize 300k > TimeseriesMetadata, which costs about 250ms (just for reading metadata of one > tsfile). This may cause the query much slow. > > > As this scenario is not rare, I think this should be optimized by adding more > indexes in TsFileMetadata. > > Thanks, > -- > Jialin Qiao > School of Software, Tsinghua University > > 乔嘉林 > 清华大学 软件学院
