GitHub user ChengXiangLi opened a pull request:

    https://github.com/apache/flink/pull/888

    [FLINK-2240] Use BloomFilter to filter probe records in Hybrid-Hash-Join

    In Hybrid-Hash-Join, while small table does not fit into memory, part of 
the small table data would be spilled to disk, and the counterpart partition of 
big table data would be spilled to disk in probe phase as well. If we build a 
BloomFilter while spill small table to disk during build phase, and use it to 
filter the big table records which tend to be spilled to disk, this may greatly 
reduce the spilled big table file size, and saved the disk IO cost for writing 
and further reading.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ChengXiangLi/flink hj-bloomfilter

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/888.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #888
    
----
commit 78c59d6ee52a00fd4964001cbce81437c38d86cb
Author: chengxiang li <[email protected]>
Date:   2015-07-03T15:53:47Z

    add bloom filter for spilled partitions in hashtable.

commit cacaa9a15a5330c6130306841ef73958490cf69d
Author: chengxiang li <[email protected]>
Date:   2015-07-06T07:15:39Z

    fix previous get buckets method

commit 6bbbb27d4935da72ae44ec404f884a74de7bbc4c
Author: chengxiang li <[email protected]>
Date:   2015-07-06T08:07:30Z

    fix  some format issues.

commit b7fee8d26445db4bba7928bfff8a9dd5ada8cd03
Author: chengxiang li <[email protected]>
Date:   2015-07-06T08:08:52Z

    Merge remote-tracking branch 'upstream/master' into hj-bloomfilter

commit d352c090b9c06baf701235809f7dfd0b4e9b87af
Author: Li <[email protected]>
Date:   2015-07-06T08:44:13Z

    add tab as indent of blank line.

commit edacfb3ae17beeb84630d73f8452629d3e19b66b
Author: Li <[email protected]>
Date:   2015-07-06T08:48:56Z

    fix tab indent for blank lines.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to