[ https://issues.apache.org/jira/browse/FLINK-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chengxiang Li updated FLINK-2241: --------------------------------- Summary: Use BloomFilter to minmize probe side records which are spilled to disk in Hybrid-Hash-Join (was: Use BloomFilter to minmize build side records which spilled to disk in Hybrid-Hash-Join) > Use BloomFilter to minmize probe side records which are spilled to disk in > Hybrid-Hash-Join > ------------------------------------------------------------------------------------------- > > Key: FLINK-2241 > URL: https://issues.apache.org/jira/browse/FLINK-2241 > Project: Flink > Issue Type: Improvement > Components: Core > Reporter: Chengxiang Li > Priority: Minor > > In Hybrid-Hash-Join, while small table does not fit into memory, part of the > small table data would be spilled to disk, and the counterpart partition of > big table data would be spilled to disk in probe phase as well. If we build a > BloomFilter while spill small table to disk during build phase, and use it to > filter the big table records which tend to be spilled to disk, this may > greatly reduce the spilled big table file size, and saved the disk IO cost > for writing and further reading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)