[ 
https://issues.apache.org/jira/browse/FLINK-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated FLINK-2241:
---------------------------------
    Summary: Use BloomFilter to minmize probe side records which are spilled to 
disk in Hybrid-Hash-Join  (was: Use BloomFilter to minmize build side records 
which spilled to disk in Hybrid-Hash-Join)

> Use BloomFilter to minmize probe side records which are spilled to disk in 
> Hybrid-Hash-Join
> -------------------------------------------------------------------------------------------
>
>                 Key: FLINK-2241
>                 URL: https://issues.apache.org/jira/browse/FLINK-2241
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Chengxiang Li
>            Priority: Minor
>
> In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
> small table data would be spilled to disk, and the counterpart partition of 
> big table data would be spilled to disk in probe phase as well. If we build a 
> BloomFilter while spill small table to disk during build phase, and use it to 
> filter the big table records which tend to be spilled to disk, this may 
> greatly  reduce the spilled big table file size, and saved the disk IO cost 
> for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to