kadirozde commented on pull request #1256: URL: https://github.com/apache/phoenix/pull/1256#issuecomment-914695124
@lhofhansl , @comnetwork, I have done some performance testing on a cluster with 15 region servers. I created a data table with 16 million rows. Each row is about 2500 bytes. The row key of this table is composed of four fields (VARCHAR, INTEGER, TIMESTAMP, VARCHAR). I run the same test without an index, with a covered index and with an uncovered index. The timestamp field is indexed. The query used in the test returned N rows that fall in to the a supplied timestamp range, where N is supplied as the limit parameter. The query returns four fields. The query times in ms are as follows: limit covered uncovered no index 1 212 252 4404 10 215 256 5375 100 215 310 5169 1000 232 1125 4698 10000 433 7325 6440 100000 1588 67002 6789 It is clear that if the number of selected rows is large (in this case 10000 or more) the uncovered index starts to perform worse than the full table scan. No sure if these results are generalizable. Instead of using an uncovered index by default, I will add a logic to use an uncovered index only if it is given as a hint. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
