GitHub user jianqiao opened a pull request:

    https://github.com/apache/incubator-quickstep/pull/172

    Query optimization with ExactFilter

    This is a follow-up optimization based on the facility provided by 
LIPFilters. Note that LIP (lookahead information passing) is an optimization 
that we can inject efficient filters (e.g. bloom filters) into 
Select/HashJoin/Aggregate operators to pre-filter the input relations.
    
    This PR strength-reduces `HashJoin`s (including inner/semi/anti joins) into 
`FilterJoin`s. The semantics of a `FilterJoin` is simple: if certain conditions 
are met, we can build a bit vector from the build side and use the bit vector 
to _filter_ the probe side.
    
    The execution part is slightly more optimized: a `FilterJoin` will not 
always be converted into a `SelectOperator` plus a `LIPFilter` as its semantics 
indicates. Instead, in most situations we can avoid creating the 
`SelectOperator` by attaching the `LIPFilter` properly to some downstream 
operators – thus avoid unnecessary materialization of intermediate relations.
    
    
    Below shows the performance improvement for SSB scale factor 100 on a 
cloudlab machine:
    
    **SSB SF100**|**master (ms)**|**w/ ExactFilter (ms)**
    :-----:|:-----:|:-----:
    Q01|709|574
    Q02|648|593
    Q03|605|564
    Q04|906|675
    Q05|754|457
    Q06|498|549
    Q07|1687|1696
    Q08|598|591
    Q09|481|470
    Q10|450|442
    Q11|1208|882
    Q12|876|656
    Q13|515|475
    Total|9937|8625

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-quickstep exact-filter

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-quickstep/pull/172.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #172
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to