kadirozde commented on pull request #1256:
URL: https://github.com/apache/phoenix/pull/1256#issuecomment-914695124


   @lhofhansl , @comnetwork, I have done some performance testing on a cluster 
with 15 region servers. I created a data table with 16 million rows. Each row 
is about 2500 bytes.  The row key of this table is composed of  four fields 
(VARCHAR, INTEGER, TIMESTAMP, VARCHAR). I run the same test without an index, 
with a covered index and with an uncovered index. The timestamp field is 
indexed. The query used in the test returned N rows that fall in to the a 
supplied timestamp range, where N is supplied as the limit parameter.  The 
query returns four fields. The query times in ms are as follows:
   
   limit       covered         uncovered       no index
   1                 212                   252                  4404
   10               215                   256                  5375
   100             215                  310                  5169
   1000           232                1125                 4698
   10000        433                7325                  6440
   100000   1588              67002                 6789
   
   It is clear that if the number of selected rows is large (in this case 10000 
or more) the uncovered index starts to perform worse than the full table scan. 
No sure if these results are generalizable. Instead of using an uncovered index 
by default, I will add a logic to use an uncovered index only if it is given as 
a hint.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to