Lei Rui created IOTDB-306:
-----------------------------

             Summary: count query is not that fast
                 Key: IOTDB-306
                 URL: https://issues.apache.org/jira/browse/IOTDB-306
             Project: Apache IoTDB
          Issue Type: Improvement
            Reporter: Lei Rui


According to my test, 

*q1: select count(s_10) from root.group_0.d_17 where 
time>=2018-09-20T00:00:00+08:00 and time<=2018-09-20T23:59:59+08:00*
||Total time cost||readTsFileMetaData||readTsDeviceMetaData||readMemChunk||
|23,998|1,367|13,591|7,592|

 Unit: ms

*q2: select s_10 from root.group_0.d_17 where time>=2018-09-20T00:00:00+08:00 
and time<=2018-09-20T23:59:59+08:00*
||Total time cost||readTsFileMetaData||readTsDeviceMetaData||readMemChunk||
|27,783|31.2+2,068|134+13,880|14.9+9,587|

 Unit: ms

(The "+" is because the step happens in both `createNewData` and 
`convertQueryDataSetByFetchSize` phases.)

As is shown,  the total time cost of q1 is just a little bit smaller than q2. 
The costs of the three major steps - `readTsFileMetaData`, 
`readTsDeviceMetaData`, and `readMemChunk` - are very close. 

The reason for this consequence is that the query execution process of count 
query reads chunk data from disk into memory anyway and in the best cases 
utilizes statistics (i,e., numOfPoints) in the pageHeader instead of reading 
page data. However, the time cost of reading page data from ChunkBuffer (see 
`ChunkReader.nextBatch`) is not that large, as it is performed in memory. 
Therefore, the execution process of count query overlaps mostly with that of 
without count query.

And probably other aggregate queries have the similar results.

A direction of performance improvement is to avoid `readMemChunk` whenever the 
statistics in the ChunkMetaData can be utilized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to