GitHub user hbdeshmukh opened a pull request:

    https://github.com/apache/incubator-quickstep/pull/128

    Refine estimates for estimateCardinality for TableReference

    - Use exact CatalogRelation statistics in SimpleCostModel's
      estimateCardinality method, whenever stats on that relation are available.
    
    This feature particularly helps in TPC-H Q21 (100 scale factor), in which 
there are two large hash tables built on the lineitem (cardinality: 600M) 
table. Such large hash tables leave very little room in the buffer pool for 
other operators, causing evictions and resulting in a large execution time. 
    
    In the original branch, the estimate for the number of entries in the hash 
table, as provided by the cost model is 1.09B. After this PR, we can reduce 
that estimate to 600M. Running Q21 the CloudLab machine, gave the following 
results: (time in milliseconds).
    
    | Original | After changes | Speed-up |
    |----------|---------------|----------|
    | 178,787  | 12,043        | 14.85    |

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-quickstep 
lower-build-cardinality

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-quickstep/pull/128.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #128
    
----
commit e51213c615f2273896e9df07f4a1a23d201a6d46
Author: Harshad Deshmukh <hbdeshm...@apache.org>
Date:   2016-11-05T15:04:07Z

    Refine estimates for estimateCardinality for TableReference
    
    - Use exact CatalogRelation statistics in SimpleCostModel's
      estimateCardinality method, whenever stats on that relation are available.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to