GitHub user hbdeshmukh opened a pull request: https://github.com/apache/incubator-quickstep/pull/128
Refine estimates for estimateCardinality for TableReference - Use exact CatalogRelation statistics in SimpleCostModel's estimateCardinality method, whenever stats on that relation are available. This feature particularly helps in TPC-H Q21 (100 scale factor), in which there are two large hash tables built on the lineitem (cardinality: 600M) table. Such large hash tables leave very little room in the buffer pool for other operators, causing evictions and resulting in a large execution time. In the original branch, the estimate for the number of entries in the hash table, as provided by the cost model is 1.09B. After this PR, we can reduce that estimate to 600M. Running Q21 the CloudLab machine, gave the following results: (time in milliseconds). | Original | After changes | Speed-up | |----------|---------------|----------| | 178,787 | 12,043 | 14.85 | You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-quickstep lower-build-cardinality Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-quickstep/pull/128.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #128 ---- commit e51213c615f2273896e9df07f4a1a23d201a6d46 Author: Harshad Deshmukh <hbdeshm...@apache.org> Date: 2016-11-05T15:04:07Z Refine estimates for estimateCardinality for TableReference - Use exact CatalogRelation statistics in SimpleCostModel's estimateCardinality method, whenever stats on that relation are available. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---