Good job! 

It looks like that reading chunkMeta cost is acceptable.  We do not need to 
keep chunkMeta in memory, hot compaction could be merged into merge operation.



Thanks!

[email protected] 

 
From: Haonan Hou
Date: 2020-07-13 22:27
To: [email protected]
Subject: Re: [Weekly Report] IoTDB Weekly News (2020-07-04~2020-07-12)
Hi,
 
I did the experiment of the percentage of reading metadata in whole reading 
processing.
 
I created one TsFile with 1 storage group, 1 device and 100,000 measurements. 
Each measurement has 8 chunks of data and each chunk has 200 data points. The 
total size of the TsFile is 909.5MB.
 
Then I created and run a test on HDD to read all chunkMetadata and read all 
data points one by one using the ChunkMetadata and ChunkReader.  By doing this, 
I got the time of reading all ChunkMetadata and all data points.
 
The final result is as below:
 
Reading 800,000 ChunkMetadata costs 2977ms.
Reading 160,000,000 points costs 60780ms.
 
Best wishes,
 
Haonan Hou
 
On Jul 12, 2020, at 8:48 PM, Jialin Qiao 
<[email protected]<mailto:[email protected]>> wrote:
 
Hi,
 
Thanks Justin, I would like to add some details.
 
We first introduced the main idea of hot compaction: When a memtable reaches 
the threshold but the average number of points in each series does not reach 
our goal, we flush it to a vm(virtual memory) file. After there is enough (a 
configuration) vm files, we merged all vm files to the target TsFile and close 
the TsFile. By this means, we could get a larger chunk that accommodates to the 
query.
 
The reason we call it hot compaction but not normal compaction is that the vm 
is not closed, which means it only has data chunks without relating metadata. 
All metadatas of vm are cached in memory. Therefore, we avoid IO, serializing 
and deserializing these metadatas when doing hot compaction. It is essentially 
an exchanging memory for IO and CPU.
 
However, we do not have clear idea about how much percent the reading metadata 
occupies in compaction. So we decided to do an experiment first.
 
Thanks,
--
Jialin Qiao
School of Software, Tsinghua University
 
乔嘉林
清华大学 软件学院
 
-----原始邮件-----
发件人: "Justin Mclean" <[email protected]<mailto:[email protected]>>
发送时间: 2020-07-12 17:31:30 (星期日)
收件人: dev <[email protected]<mailto:[email protected]>>
抄送:
主题: Re: [Weekly Report] IoTDB Weekly News (2020-07-04~2020-07-12)
 
Hi,
 
[Big Event]
# We held the first online discussion yesterday, looking forward to more 
attendances next time.
 
It would be good if the detail and what was discussed was shared with this list.
 
Having meetings like disadvantages those who can not attend sure to time zone 
or other commitments.
 
Thanks,
Justin
 

Reply via email to