Re: [Discuss] Optimize TsFile Metadata

孙泽嵩 Mon, 20 Apr 2020 00:09:18 -0700

Hi,

I'm considering about a tree-structured level index as I describe in this JIRA 
[1]:


• Each internal node is an array, which is consisted of triple-element items: 
[<device / measurement name, offset, next>, ... ]. These elements represent the 
children nodes of the internal node.
• By the end of the array, there is an "empty" item with an empty string and 
the offset equals to the end offset of children elements, so that the length of 
every element could be easy to calculate.
• The field next is an enum value, which represents the type of the next child 
node. For example, di means the device index, d means the device, mi means the 
measurement index, and m means the measurement. Only m is leaf node with 
TimeseriesMetadata.
• The largest number of children nodes N could be configured by users. (In the 
examples [2] , I set N = 10 for convenience) 
• The storage process is from bottom to up. Whenever the blocks > N, a parent 
level index will be generated and will be insisted into disk.
• The query process is from top to bottom with binary search of the array.

I present some examples in the attachment [3] : 5 devices with 5 measurements 
each; 1 device with 150 measurements; 150 devices with 1 measurement each; 150 
devices with 150 measurements each.

What do you think about this idea? I'd be very pleased to modify or supplement 
this proposal if there are any problems : )


[1] https://issues.apache.org/jira/browse/IOTDB-605 
<https://issues.apache.org/jira/browse/IOTDB-605>
[2] 
https://issues.apache.org/jira/secure/attachment/13000474/Structure%20of%20MetadataIndex%20levels.png
 
<https://issues.apache.org/jira/secure/attachment/13000474/Structure%20of%20MetadataIndex%20levels.png>
[3] https://issues.apache.org/jira/secure/attachment/13000479/Examples.png 
<https://issues.apache.org/jira/secure/attachment/13000479/Examples.png>

Best,
-----------------------------------
Zesong Sun
School of Software, Tsinghua University

孙泽嵩
清华大学 软件学院

> 2020年4月17日 20:46，Jialin Qiao <[email protected]> 写道：
> 
> Hi,
> 
> 
> I meet a scenario that one device has 300k measurements.
> 
> 
> When we read one time series in a TsFile, we need to deserialize 300k 
> TimeseriesMetadata, which costs about 250ms (just for reading metadata of one 
> tsfile). This may cause the query much slow.
> 
> 
> As this scenario is not rare, I think this should be optimized by adding more 
> indexes in TsFileMetadata.
> 
> Thanks,
> --
> Jialin Qiao
> School of Software, Tsinghua University
> 
> 乔嘉林
> 清华大学 软件学院

Re: [Discuss] Optimize TsFile Metadata

Reply via email to