GitHub user linwen opened a pull request: https://github.com/apache/incubator-hawq/pull/1360
HAWQ-1607. This commit implements applying Bloom filter during Scan outer table 1. Pash down Bloom filter structure to outer table scan(only support parquet); 2. Check if the tuple from outer table is found in Bloom filter structure. 3. Add a GUC hawq_hashjoin_bloomfilter_sampling_number. This guc value controls the Bloom filter sampling number, while scanning outer table, for first N tuples of the outer table, if the ratio is larger than hawq_hashjoin_bloomfilter_ratio, the remain tuples will not be checked by Bloom filter. 4. If there is any expression on outer join keys except T_Var(projection), such as, fact.c1 + 1 = dim.c1. 2, if there are multiple join keys, e.g. fact.c1 = dim.c1 and fact.c2 = dim.c2, Bloomfilter won't be created. Since these cases invloves pushing down expression and project information to scan, which will be implemented later. Please review, thanks! You can merge this pull request into a Git repository by running: $ git pull https://github.com/linwen/incubator-hawq hawq_1607v2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq/pull/1360.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1360 ---- commit 08a0951af95ce4945cc67a5c7bc67acdc4e9b94e Author: Wen Lin <wlin@...> Date: 2018-05-06T13:19:14Z HAWQ-1607. This commit implements applying Bloom filter during Scan outer table, test cases will be added with HAWQ-1608. 1. Pash down Bloom filter structure to outer table scan(only support parquet); 2. Check if the tuple from outer table is found in Bloom filter structure. 3. Add a GUC hawq_hashjoin_bloomfilter_sampling_number. This guc value controls the Bloom filter sampling number, while scanning outer table, for first N tuples of the outer table, if the ratio is larger than hawq_hashjoin_bloomfilter_ratio, the remain tuples will not be checked by Bloom filter. 4. If there is any expression on outer join keys except T_Var(projection), such as, fact.c1 + 1 = dim.c1. 2, if there are multiple join keys, e.g. fact.c1 = dim.c1 and fact.c2 = dim.c2, Bloomfilter won't be created. Since these cases invloves pushing down expression and project information to scan, which will be implemented later. ---- ---