Hi,

I perform it directly. 

Some supplements:

- The version I used is the 0.11.0-SNAPSHOT in master branch. The specific 
commit does not matter because we do not modify the query engine much after 
0.10.0.
- Each time I execute the query, I restart the IoTDB to avoid the influence of 
cache in query.
- Another factor that accelerates query is the number of tsfile is decreased, 
i.e., the number of TsFileMetadata need to read is decreased.

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -----原始邮件-----
> 发件人: "Julian Feinauer" <[email protected]>
> 发送时间: 2020-07-09 15:39:57 (星期四)
> 收件人: "[email protected]" <[email protected]>
> 抄送: 
> 主题: Re: [Experiment sharing] How chunk size(number of points) impact the 
> query performance
> 
> Hey,
> 
> very interesting experiment.
> Did you use something like JMH fort he Benchmark or did you perform it 
> directly?
> 
> Julian
> 
> Am 09.07.20, 09:29 schrieb "孙泽嵩" <[email protected]>:
> 
>     Hi Jialin,
> 
>     Great experiment! Thanks for your sharing.
> 
>     Looking forward to the function of hot compaction.
> 
> 
>     Best,
>     -----------------------------------
>     Zesong Sun
>     School of Software, Tsinghua University
> 
>     孙泽嵩
>     清华大学 软件学院
> 
>     > 2020年7月8日 16:39,Jialin Qiao <[email protected]> 写道:
>     > 
>     > Hi,
>     > 
>     > 
>     > I'd like to share with you some experiment results about how chunk size 
> impact the query performance. 
>     > 
>     > 
>     > Hardware: 
>     > MacBook Pro (Retina, 15-inch, Mid 2015)
>     > CPU: 2.2 GHz Intel Core i7
>     > Memory: 16 GB 1600 MHz DDR3
>     > I use a mobile HDD (SEAGATE, 1TB, Model SRD00F1)  as the storage.
>     > 
>     > 
>     > Workload: 1 storage group, 1 device, 100 measurements in long type. 1 
> million data points generated randomly for each time series. 
>     > 
>     > 
>     > A background knowledge is the origin flushed chunk size = 
> memtable_size_threshold / series number / byte per data point (16 for long 
> data points)
>     > 
>     > 
>     > I adjust the memtable_size_threhold to control the chunk size.
>     > 
>     > 
>     > Configurations of IoTDB:
>     > 
>     > 
>     > enable_parameter_adapter=false
>     > avg_series_point_number_threshold=10000000 (to make the 
> memtable_size_threshold valid)
>     > page_size_in_byte=1000000000 (each chunk has one page)
>     > tsfile_size_threshold = memtable_size_threshold = 
> 160000/1600000/16000000/160000000/1600000000
>     > 
>     > 
>     > I use SessionExample.insertTablet to insert data under different 
> configurations. Then I got Chunk sizes from 100 to 1000000.
>     > 
>     > 
>     > Then I use SessionExample.queryByIterator to iterate the result set of 
> "select s1 from root.sg1.d1" without constructing other data structures.
>     > 
>     > 
>     > The results are:
>     > 
>     > 
>     > | chunk size | query time cost in ms |
>     > |   100          |     47620                     |
>     > |   1000        |     13984                     |
>     > |   10000      |     2416                       |
>     > |   100000    |     1322                       |
>     > 
>     > 
>     > As we could see the chunk size has a dominate impact to the raw data 
> query performance. In the current query engine, Chunk is the basic data unit 
> to read from the disk. For reading each Chunk, we need one seek + one IO 
> operation. A larger chunk size means less Chunks to read. 
>     > 
>     > 
>     > Therefore, it's better to enlarge the memtable_size_threshold for 
> accelerate queries. However, enlarging memtable_size_threshold means more 
> memory is needed. This is not always satisfied in some scenes. Therefore, we 
> need compaction, either hot compaction triggered in flushing or the timed 
> compaction strategy, to compact small chunks to a large one.
>     > 
>     > 
>     > Thanks,
>     > --
>     > Jialin Qiao
>     > School of Software, Tsinghua University
>     > 
>     > 乔嘉林
>     > 清华大学 软件学院
> 
> 

Reply via email to