Hi, Great job! This should avoid reading too many TimeseriesMetadatas at a time.
Thanks, -- Jialin Qiao School of Software, Tsinghua University 乔嘉林 清华大学 软件学院 > -----原始邮件----- > 发件人: "孙泽嵩" <[email protected]> > 发送时间: 2020-04-20 15:08:28 (星期一) > 收件人: [email protected] > 抄送: > 主题: Re: [Discuss] Optimize TsFile Metadata > > Hi, > > I'm considering about a tree-structured level index as I describe in this > JIRA [1]: > > • Each internal node is an array, which is consisted of triple-element items: > [<device / measurement name, offset, next>, ... ]. These elements represent > the children nodes of the internal node. > • By the end of the array, there is an "empty" item with an empty string and > the offset equals to the end offset of children elements, so that the length > of every element could be easy to calculate. > • The field next is an enum value, which represents the type of the next > child node. For example, di means the device index, d means the device, mi > means the measurement index, and m means the measurement. Only m is leaf node > with TimeseriesMetadata. > • The largest number of children nodes N could be configured by users. (In > the examples [2] , I set N = 10 for convenience) > • The storage process is from bottom to up. Whenever the blocks > N, a parent > level index will be generated and will be insisted into disk. > • The query process is from top to bottom with binary search of the array. > > I present some examples in the attachment [3] : 5 devices with 5 measurements > each; 1 device with 150 measurements; 150 devices with 1 measurement each; > 150 devices with 150 measurements each. > > What do you think about this idea? I'd be very pleased to modify or > supplement this proposal if there are any problems : ) > > > [1] https://issues.apache.org/jira/browse/IOTDB-605 > <https://issues.apache.org/jira/browse/IOTDB-605> > [2] > https://issues.apache.org/jira/secure/attachment/13000474/Structure%20of%20MetadataIndex%20levels.png > > <https://issues.apache.org/jira/secure/attachment/13000474/Structure%20of%20MetadataIndex%20levels.png> > [3] https://issues.apache.org/jira/secure/attachment/13000479/Examples.png > <https://issues.apache.org/jira/secure/attachment/13000479/Examples.png> > > Best, > ----------------------------------- > Zesong Sun > School of Software, Tsinghua University > > 孙泽嵩 > 清华大学 软件学院 > > > 2020年4月17日 20:46,Jialin Qiao <[email protected]> 写道: > > > > Hi, > > > > > > I meet a scenario that one device has 300k measurements. > > > > > > When we read one time series in a TsFile, we need to deserialize 300k > > TimeseriesMetadata, which costs about 250ms (just for reading metadata of > > one tsfile). This may cause the query much slow. > > > > > > As this scenario is not rare, I think this should be optimized by adding > > more indexes in TsFileMetadata. > > > > Thanks, > > -- > > Jialin Qiao > > School of Software, Tsinghua University > > > > 乔嘉林 > > 清华大学 软件学院 >
