[jira] [Commented] (IOTDB-306) count query is not that fast

Jialin Qiao (Jira) Mon, 17 Feb 2020 17:08:14 -0800


    [ 
https://issues.apache.org/jira/browse/IOTDB-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038708#comment-17038708
 ]


Jialin Qiao commented on IOTDB-306:
-----------------------------------

[https://github.com/apache/incubator-iotdb/pull/713]

> count query is not that fast
> ----------------------------
>
>                 Key: IOTDB-306
>                 URL: https://issues.apache.org/jira/browse/IOTDB-306
>             Project: Apache IoTDB
>          Issue Type: Improvement
>            Reporter: Lei Rui
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> According to my test, 
> *q1: select count(s_10) from root.group_0.d_17 where 
> time>=2018-09-20T00:00:00+08:00 and time<=2018-09-20T23:59:59+08:00*
> ||Total time cost||readTsFileMetaData||readTsDeviceMetaData||readMemChunk||
> |23,998|1,367|13,591|7,592|
>  Unit: ms
> *q2: select s_10 from root.group_0.d_17 where time>=2018-09-20T00:00:00+08:00 
> and time<=2018-09-20T23:59:59+08:00*
> ||Total time cost||readTsFileMetaData||readTsDeviceMetaData||readMemChunk||
> |27,783|31.2+2,068|134+13,880|14.9+9,587|
>  Unit: ms
> (The "+" is because the step happens in both `createNewDataSet` and 
> `convertQueryDataSetByFetchSize` phases.)
> As is shown,  the total time cost of q1 is just a little bit smaller than q2. 
> The costs of the three major steps - `readTsFileMetaData`, 
> `readTsDeviceMetaData`, and `readMemChunk` - are very close. 
> The reason for this consequence is that the query execution process of count 
> query reads chunk data from disk into memory anyway and in the best cases 
> utilizes statistics (i,e., numOfPoints) in the pageHeader instead of reading 
> page data. However, the time cost of reading page data (see 
> `ChunkReader.nextBatch`) is not that large, as it is performed in memory. 
> Therefore, the execution process of count query overlaps mostly with that of 
> without count query.
> And probably other aggregate queries have the similar results.
> A direction of performance improvement of count query (and probably other 
> aggregate queries) is to avoid `readMemChunk` whenever the statistics in the 
> ChunkMetaData can be utilized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (IOTDB-306) count query is not that fast

Reply via email to