GitHub user linwen opened a pull request:

    https://github.com/apache/incubator-hawq/pull/1360

    HAWQ-1607. This commit implements applying Bloom filter during Scan outer 
table

    1. Pash down Bloom filter structure to outer table scan(only support 
parquet);
        2. Check if the tuple from outer table is found in Bloom filter 
structure.
        3. Add a GUC hawq_hashjoin_bloomfilter_sampling_number. This guc value 
controls the Bloom filter sampling number, while scanning outer table, for 
first N tuples of the outer table, if the ratio is larger than 
hawq_hashjoin_bloomfilter_ratio, the remain tuples will not be checked by Bloom 
filter.
        4. If there is any expression on outer join keys except 
T_Var(projection), such as, fact.c1 + 1 = dim.c1. 2, if there are multiple join 
keys, e.g. fact.c1 = dim.c1 and fact.c2 = dim.c2, Bloomfilter won't be created. 
Since these cases invloves pushing down expression and project information to 
scan, which will be implemented later.
    
    Please review, thanks!

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/linwen/incubator-hawq hawq_1607v2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hawq/pull/1360.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1360
    
----
commit 08a0951af95ce4945cc67a5c7bc67acdc4e9b94e
Author: Wen Lin <wlin@...>
Date:   2018-05-06T13:19:14Z

    HAWQ-1607. This commit implements applying Bloom filter during Scan outer 
table, test cases will be added with HAWQ-1608.
        1. Pash down Bloom filter structure to outer table scan(only support 
parquet);
        2. Check if the tuple from outer table is found in Bloom filter 
structure.
        3. Add a GUC hawq_hashjoin_bloomfilter_sampling_number. This guc value 
controls the Bloom filter sampling number, while scanning outer table, for 
first N tuples of the outer table, if the ratio is larger than 
hawq_hashjoin_bloomfilter_ratio, the remain tuples will not be checked by Bloom 
filter.
        4. If there is any expression on outer join keys except 
T_Var(projection), such as, fact.c1 + 1 = dim.c1. 2, if there are multiple join 
keys, e.g. fact.c1 = dim.c1 and fact.c2 = dim.c2, Bloomfilter won't be created. 
Since these cases invloves pushing down expression and project information to 
scan, which will be implemented later.

----


---

Reply via email to