[
https://issues.apache.org/jira/browse/IOTDB-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038708#comment-17038708
]
Jialin Qiao commented on IOTDB-306:
-----------------------------------
[https://github.com/apache/incubator-iotdb/pull/713]
> count query is not that fast
> ----------------------------
>
> Key: IOTDB-306
> URL: https://issues.apache.org/jira/browse/IOTDB-306
> Project: Apache IoTDB
> Issue Type: Improvement
> Reporter: Lei Rui
> Priority: Major
> Labels: pull-request-available
> Time Spent: 50m
> Remaining Estimate: 0h
>
> According to my test,
> *q1: select count(s_10) from root.group_0.d_17 where
> time>=2018-09-20T00:00:00+08:00 and time<=2018-09-20T23:59:59+08:00*
> ||Total time cost||readTsFileMetaData||readTsDeviceMetaData||readMemChunk||
> |23,998|1,367|13,591|7,592|
> Unit: ms
> *q2: select s_10 from root.group_0.d_17 where time>=2018-09-20T00:00:00+08:00
> and time<=2018-09-20T23:59:59+08:00*
> ||Total time cost||readTsFileMetaData||readTsDeviceMetaData||readMemChunk||
> |27,783|31.2+2,068|134+13,880|14.9+9,587|
> Unit: ms
> (The "+" is because the step happens in both `createNewDataSet` and
> `convertQueryDataSetByFetchSize` phases.)
> As is shown, the total time cost of q1 is just a little bit smaller than q2.
> The costs of the three major steps - `readTsFileMetaData`,
> `readTsDeviceMetaData`, and `readMemChunk` - are very close.
> The reason for this consequence is that the query execution process of count
> query reads chunk data from disk into memory anyway and in the best cases
> utilizes statistics (i,e., numOfPoints) in the pageHeader instead of reading
> page data. However, the time cost of reading page data (see
> `ChunkReader.nextBatch`) is not that large, as it is performed in memory.
> Therefore, the execution process of count query overlaps mostly with that of
> without count query.
> And probably other aggregate queries have the similar results.
> A direction of performance improvement of count query (and probably other
> aggregate queries) is to avoid `readMemChunk` whenever the statistics in the
> ChunkMetaData can be utilized.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)