Hi, I did the experiment of the percentage of reading metadata in whole reading processing.
I created one TsFile with 1 storage group, 1 device and 100,000 measurements. Each measurement has 8 chunks of data and each chunk has 200 data points. The total size of the TsFile is 909.5MB. Then I created and run a test on HDD to read all chunkMetadata and read all data points one by one using the ChunkMetadata and ChunkReader. By doing this, I got the time of reading all ChunkMetadata and all data points. The final result is as below: Reading 800,000 ChunkMetadata costs 2977ms. Reading 160,000,000 points costs 60780ms. Best wishes, Haonan Hou On Jul 12, 2020, at 8:48 PM, Jialin Qiao <[email protected]<mailto:[email protected]>> wrote: Hi, Thanks Justin, I would like to add some details. We first introduced the main idea of hot compaction: When a memtable reaches the threshold but the average number of points in each series does not reach our goal, we flush it to a vm(virtual memory) file. After there is enough (a configuration) vm files, we merged all vm files to the target TsFile and close the TsFile. By this means, we could get a larger chunk that accommodates to the query. The reason we call it hot compaction but not normal compaction is that the vm is not closed, which means it only has data chunks without relating metadata. All metadatas of vm are cached in memory. Therefore, we avoid IO, serializing and deserializing these metadatas when doing hot compaction. It is essentially an exchanging memory for IO and CPU. However, we do not have clear idea about how much percent the reading metadata occupies in compaction. So we decided to do an experiment first. Thanks, -- Jialin Qiao School of Software, Tsinghua University 乔嘉林 清华大学 软件学院 -----原始邮件----- 发件人: "Justin Mclean" <[email protected]<mailto:[email protected]>> 发送时间: 2020-07-12 17:31:30 (星期日) 收件人: dev <[email protected]<mailto:[email protected]>> 抄送: 主题: Re: [Weekly Report] IoTDB Weekly News (2020-07-04~2020-07-12) Hi, [Big Event] # We held the first online discussion yesterday, looking forward to more attendances next time. It would be good if the detail and what was discussed was shared with this list. Having meetings like disadvantages those who can not attend sure to time zone or other commitments. Thanks, Justin
