Re: Suggestions for new TsFile

Haonan Hou Thu, 13 Feb 2020 01:54:11 -0800

+1 
It’s a good idea. 

Thanks,


Haonan Hou

> On Feb 13, 2020, at 5:20 PM, Dawei Liu <atoi...@163.com> wrote:
> 
> Hi，
> 
> Sorry，i overlooked that the first step in server was to filter files through 
> startTime/endTime Map.
> 
> +1 for add a Statistics in TimeseriesMetadata， 
> 
> For example:
> Device Shadow (设备影子) , it is often necessary to find the last information 
> about a device
> 
> Thanks
> 
> Dawei Liu
> 
>> 2020年2月13日 下午4:58，Jialin Qiao <qj...@mails.tsinghua.edu.cn> 写道：
>> 
>> Hi,
>> 
>> I have a suggestion. 
>> We could add a Statistics in TimeseriesMetadata to support fast aggregations.
>> 
>> Thanks,
>> --
>> Jialin Qiao
>> School of Software, Tsinghua University
>> 
>> 乔嘉林
>> 清华大学 软件学院
>> 
>>> -----原始邮件-----
>>> 发件人: "Jialin Qiao" <qj...@mails.tsinghua.edu.cn>
>>> 发送时间: 2020-02-13 16:16:54 (星期四)
>>> 收件人: dev@iotdb.apache.org
>>> 抄送: 
>>> 主题: Re: Suggestions for new TsFile
>>> 
>>> Hi,
>>> 
>>> +1 for most queries contains a time filter
>>> 
>>> But I don't know what do you mean by "add a time attribute", add to where?
>>> 
>>> Thanks,
>>> --
>>> Jialin Qiao
>>> School of Software, Tsinghua University
>>> 
>>> 乔嘉林
>>> 清华大学 软件学院
>>> 
>>>> -----原始邮件-----
>>>> 发件人: "Dawei Liu" <atoi...@163.com>
>>>> 发送时间: 2020-02-13 15:55:48 (星期四)
>>>> 收件人: dev@iotdb.apache.org
>>>> 抄送: 
>>>> 主题: Re: Suggestions for new TsFile
>>>> 
>>>> Hi,
>>>> 
>>>> I found another problem, when I execute :  ` SELECT  s1 FROM xx WHERE time 
>>>> = 1`
>>>> 
>>>> In the new TsFile,  need to read the hard drive 3 times,
>>>> 
>>>> 1. Read TsFileMetaData
>>>> 
>>>> 2. Read the MetaData of all measurement of the device ( TimeSeriesMetaData 
>>>> )
>>>> 
>>>> 3. Read the required measurement of the ChunkMetaData and then the time 
>>>> filter( time = 1 ) can be filter which Chunk can be used 
>>>> 
>>>> 
>>>> 
>>>> In the current server, most of the time is used for TimeFilter, we read a 
>>>> lot of metadata information, if the end can not be used, this is a very 
>>>> big loss.
>>>> 
>>>> So I think we should add a time attribute so that we can know if the file 
>>>> is can’t to use when we first read the hard drive
>>>> 
>>>> Thanks
>>>> 
>>>> Dawei Liu
>>>> 
>>>> 
>>>>> 2020年2月12日 下午8:03，Dawei Liu <atoi...@163.com> 写道：
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I see it. It looks very comfortable.
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> Dawei Liu
>>>>> 
>>>>>> 2020年2月12日 下午6:52，Haonan Hou <hhao...@outlook.com> 写道：
>>>>>> 
>>>>>> Sure, already added.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Haonan Hou
>>>>>> 
>>>>>>> On Feb 12, 2020, at 5:57 PM, Jialin Qiao <qj...@mails.tsinghua.edu.cn> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi Haonan,
>>>>>>> 
>>>>>>> 
>>>>>>> I can not see the picture, could you please put it in the PR?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> --
>>>>>>> Jialin Qiao
>>>>>>> School of Software, Tsinghua University
>>>>>>> 
>>>>>>> 乔嘉林
>>>>>>> 清华大学 软件学院
>>>>>>> 
>>>>>>> -----原始邮件-----
>>>>>>> 发件人:"Haonan Hou" <hhao...@outlook.com>
>>>>>>> 发送时间:2020-02-12 16:27:22 (星期三)
>>>>>>> 收件人: "dev@iotdb.apache.org" <dev@iotdb.apache.org>
>>>>>>> 抄送:
>>>>>>> 主题: Re: Suggestions for new TsFile
>>>>>>> 
>>>>>>> Hi, 
>>>>>>> 
>>>>>>> 
>>>>>>> We have a newer design of TsFile, which combines the suggestions from 
>>>>>>> Jialin and Dawei. 
>>>>>>> 
>>>>>>> 
>>>>>>> The mean differences is as below:
>>>>>>> 
>>>>>>> 
>>>>>>> 1. Remove TsOffsetArray.
>>>>>>> 2. Modify the device map in TsFileMetaData to store the start offset of 
>>>>>>> first TimeseriesMetadata and total data size of all TimeseriesMetadatas 
>>>>>>> in each device. 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> The newer TsFile structure should be looked like:
>>>>>>> 
>>>>>>> Here is an example of how the new structure works.
>>>>>>> 
>>>>>>> 
>>>>>>> When we try to get List<ChunkMetadata> of Timeseries "d0.s1", first we 
>>>>>>> deserialize the map in TsFileMetadata, and we have the startOffset of 
>>>>>>> TimseriesMetadata “s0", 
>>>>>>> the first TimeseiresMetadata of “d0", and data size of all 
>>>>>>> TimeseriesMetadatas in “d0". 
>>>>>>> 
>>>>>>> 
>>>>>>> After that, we are able to deserialize all TimeseriesMetadata in “d0”. 
>>>>>>> 
>>>>>>> 
>>>>>>> Finally we have the TimeseriesMetadata "d0.s1" and can get the 
>>>>>>> ChunkMetadata List of "d0.s1".
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> 
>>>>>>> Haonan Hou
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Feb 11, 2020, at 8:08 PM, Jialin Qiao <qiaojia...@apache.org> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> If each device only stores each offset of TimeseriesMetadata like this:
>>>>>>> TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5] 
>>>>>>> }, …
>>>>>>> }
>>>>>>> 
>>>>>>> It could be simplified to recording the start offset and end offset:
>>>>>>> TsFileMetaData ---> [ {deviceId(d0), [0, 2] }, {deviceId(d1), [3,5] }, 
>>>>>>> … }
>>>>>>> 
>>>>>>> And finally, it could be replaced by: TsFileMetaData ---> [ 
>>>>>>> {deviceId(d0),
>>>>>>> 0 }, {deviceId(d1), 3 }, … }
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> —————————————————
>>>>>>> Jialin Qiao
>>>>>>> School of Software, Tsinghua University
>>>>>>> 
>>>>>>> 乔嘉林
>>>>>>> 清华大学 软件学院
>>>>>>> 
>>>>>>> 
>>>>>>> atoiLiu <atoi...@163.com> 于2020年2月11日周二 下午7:59写道：
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Thank you for your reply.
>>>>>>> I am very happy that you can take my suggestion.
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>> Dawei Liu
>>>>>>> 
>>>>>>> 
>>>>>>> 2020年2月11日 下午6:04，Haonan Hou <hhao...@outlook.com> 写道：
>>>>>>> 
>>>>>>> Hi Dawei,
>>>>>>> 
>>>>>>> Thank you so much that you share your opinion about new TsFile!
>>>>>>> I am very happy to take your suggestions.
>>>>>>> 
>>>>>>> You said we can remove TsOffsetArray and directly store the offset of
>>>>>>> TimeseriesMetaData. I agree with you. It is better than my version.
>>>>>>> Besides, for the optimization of TimeserieMetaData, I would like to
>>>>>>> discuss with other people to determine which way is better.
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Haonan Hou
>>>>>>> 
>>>>>>> 
>>>>>>> On Feb 11, 2020, at 5:35 PM, atoiLiu <atoi...@163.com> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I’m learning new TsFile in PR [1], but I think TsFileMetaData has a bad
>>>>>>> design.
>>>>>>> 
>>>>>>> TsFileMetaData has a TsOffsetArray,  TsOffsetArray is record every
>>>>>>> offset of TimeseriesMetaData, and use Map<deviceId, int[]> to record
>>>>>>> startIndex , endIndex of TsOffsetArray, it’s looks like :
>>>>>>> 
>>>>>>> TsFileMetaData —>{ [0,1,2,3,4,5, ….] [ {deviceId(d0), [0,2] },
>>>>>>> {deviceId(d1), [3,5] }, …. } }
>>>>>>> 
>>>>>>> We can delete TsOffsetArray  and store the offsets directly in the
>>>>>>> deviceIndexArray, then TsFileMatadata will has a Map<deviceId, 
>>>>>>> List<Long>>
>>>>>>> to record . This change will save 4 bytes per device on disk, because 
>>>>>>> every
>>>>>>> device just need record the number of offsets and offsets. it’s looks 
>>>>>>> like：
>>>>>>> 
>>>>>>> TsFileMetaData ---> [ {deviceId(d0), [0,1,2] }, {deviceId(d1), [3,4,5]
>>>>>>> }, … }
>>>>>>> 
>>>>>>> 
>>>>>>> In addition, TimeSeriesMetaData is an ordered structure on the hard
>>>>>>> disk, and the TimeSeriesMetaData for each device is linked together, so
>>>>>>> TsFileMetaData does not need to store all offset information, so there 
>>>>>>> two
>>>>>>> optimization directions:
>>>>>>> 
>>>>>>> 1. Save startTime , endTime and offset for each TimeSeriesMetaData in
>>>>>>> TsFileMetaData. The nice thing about this is that when you read
>>>>>>> TsFileMetaData from your hard drive, you can directly do a filter to 
>>>>>>> filter
>>>>>>> which TimeSeriesMetaData is not necessary to read.
>>>>>>> 
>>>>>>> 
>>>>>>> 2. Only save the start TimeSeriesMetaData offset in TsFileMetaData so
>>>>>>> that you can loop through it and just need once to seek, it’s looks 
>>>>>>> like :
>>>>>>> 
>>>>>>> TsFileMetaData ---> [ {deviceId(d0), 0 }, {deviceId(d1), 3 }, … }
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> [1] https://github.com/apache/incubator-iotdb/pull/736 <
>>>>>>> https://github.com/apache/incubator-iotdb/pull/736>
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>> Dawei Liu
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>

Re: Suggestions for new TsFile

Reply via email to