[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

ASF GitHub Bot (JIRA) Thu, 06 Aug 2015 19:19:35 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661203#comment-14661203
 ]


ASF GitHub Bot commented on FLINK-2240:
---------------------------------------

Github user ChengXiangLi commented on the pull request:

    https://github.com/apache/flink/pull/888#issuecomment-128563750
  
    Thanks for the review, @StephanEwen , i'm very interesting in this project, 
and i would like to contribute more. @vasia , I think stephan has helped to 
answer the question yet, the most important reason is that i want to reuse the 
memory occupied by hash table buckets. Besides, since this is a performance 
sense issue, i try to make this bloom filter as much simple and efficient as i 
can, for example, the hashcode of join key is already generated and stored in 
hybrid hash join, i just reuse the hashcode instead of generate it by join key 
value inside bloom filter again. 


> Use BloomFilter to minimize probe side records which are spilled to disk in 
> Hybrid-Hash-Join
> --------------------------------------------------------------------------------------------
>
>                 Key: FLINK-2240
>                 URL: https://issues.apache.org/jira/browse/FLINK-2240
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>            Priority: Minor
>             Fix For: 0.10
>
>
> In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
> small table data would be spilled to disk, and the counterpart partition of 
> big table data would be spilled to disk in probe phase as well. If we build a 
> BloomFilter while spill small table to disk during build phase, and use it to 
> filter the big table records which tend to be spilled to disk, this may 
> greatly  reduce the spilled big table file size, and saved the disk IO cost 
> for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

Reply via email to