Lei Rui created IOTDB-306:
-----------------------------
Summary: count query is not that fast
Key: IOTDB-306
URL: https://issues.apache.org/jira/browse/IOTDB-306
Project: Apache IoTDB
Issue Type: Improvement
Reporter: Lei Rui
According to my test,
*q1: select count(s_10) from root.group_0.d_17 where
time>=2018-09-20T00:00:00+08:00 and time<=2018-09-20T23:59:59+08:00*
||Total time cost||readTsFileMetaData||readTsDeviceMetaData||readMemChunk||
|23,998|1,367|13,591|7,592|
Unit: ms
*q2: select s_10 from root.group_0.d_17 where time>=2018-09-20T00:00:00+08:00
and time<=2018-09-20T23:59:59+08:00*
||Total time cost||readTsFileMetaData||readTsDeviceMetaData||readMemChunk||
|27,783|31.2+2,068|134+13,880|14.9+9,587|
Unit: ms
(The "+" is because the step happens in both `createNewData` and
`convertQueryDataSetByFetchSize` phases.)
As is shown, the total time cost of q1 is just a little bit smaller than q2.
The costs of the three major steps - `readTsFileMetaData`,
`readTsDeviceMetaData`, and `readMemChunk` - are very close.
The reason for this consequence is that the query execution process of count
query reads chunk data from disk into memory anyway and in the best cases
utilizes statistics (i,e., numOfPoints) in the pageHeader instead of reading
page data. However, the time cost of reading page data from ChunkBuffer (see
`ChunkReader.nextBatch`) is not that large, as it is performed in memory.
Therefore, the execution process of count query overlaps mostly with that of
without count query.
And probably other aggregate queries have the similar results.
A direction of performance improvement is to avoid `readMemChunk` whenever the
statistics in the ChunkMetaData can be utilized.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)