Re: [Discuss] Optimize TsFile Metadata

Jialin Qiao Mon, 20 Apr 2020 01:14:22 -0700

Hi,

Great job! This should avoid reading too many TimeseriesMetadatas at a time.


Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -----原始邮件-----
> 发件人: "孙泽嵩" <[email protected]>
> 发送时间: 2020-04-20 15:08:28 (星期一)
> 收件人: [email protected]
> 抄送: 
> 主题: Re: [Discuss] Optimize TsFile Metadata
> 
> Hi,
> 
> I'm considering about a tree-structured level index as I describe in this 
> JIRA [1]:
> 
> • Each internal node is an array, which is consisted of triple-element items: 
> [<device / measurement name, offset, next>, ... ]. These elements represent 
> the children nodes of the internal node.
> • By the end of the array, there is an "empty" item with an empty string and 
> the offset equals to the end offset of children elements, so that the length 
> of every element could be easy to calculate.
> • The field next is an enum value, which represents the type of the next 
> child node. For example, di means the device index, d means the device, mi 
> means the measurement index, and m means the measurement. Only m is leaf node 
> with TimeseriesMetadata.
> • The largest number of children nodes N could be configured by users. (In 
> the examples [2] , I set N = 10 for convenience) 
> • The storage process is from bottom to up. Whenever the blocks > N, a parent 
> level index will be generated and will be insisted into disk.
> • The query process is from top to bottom with binary search of the array.
> 
> I present some examples in the attachment [3] : 5 devices with 5 measurements 
> each; 1 device with 150 measurements; 150 devices with 1 measurement each; 
> 150 devices with 150 measurements each.
> 
> What do you think about this idea? I'd be very pleased to modify or 
> supplement this proposal if there are any problems : )
> 
> 
> [1] https://issues.apache.org/jira/browse/IOTDB-605 
> <https://issues.apache.org/jira/browse/IOTDB-605>
> [2] 
> https://issues.apache.org/jira/secure/attachment/13000474/Structure%20of%20MetadataIndex%20levels.png
>  
> <https://issues.apache.org/jira/secure/attachment/13000474/Structure%20of%20MetadataIndex%20levels.png>
> [3] https://issues.apache.org/jira/secure/attachment/13000479/Examples.png 
> <https://issues.apache.org/jira/secure/attachment/13000479/Examples.png>
> 
> Best,
> -----------------------------------
> Zesong Sun
> School of Software, Tsinghua University
> 
> 孙泽嵩
> 清华大学 软件学院
> 
> > 2020年4月17日 20:46，Jialin Qiao <[email protected]> 写道：
> > 
> > Hi,
> > 
> > 
> > I meet a scenario that one device has 300k measurements.
> > 
> > 
> > When we read one time series in a TsFile, we need to deserialize 300k 
> > TimeseriesMetadata, which costs about 250ms (just for reading metadata of 
> > one tsfile). This may cause the query much slow.
> > 
> > 
> > As this scenario is not rare, I think this should be optimized by adding 
> > more indexes in TsFileMetadata.
> > 
> > Thanks,
> > --
> > Jialin Qiao
> > School of Software, Tsinghua University
> > 
> > 乔嘉林
> > 清华大学 软件学院
>

Re: [Discuss] Optimize TsFile Metadata

Reply via email to