Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/888#issuecomment-119136367
  
    This is a very nice idea, thank you for the contribution! The numbers look 
quite encouraging.
    
    I need to look into this carefully, as it touches a very sensitive part of 
the system. It will probably take me a bit of time.
    
    Here are some initial comments:
    
      - The integration tests seem to be failing, this change apparently 
triggers a stack-overflow at some point. Have a look at the logs of the Travis 
CI build.
    
      - Can we add a flag to the hash-table, to enable/disable the 
bloom-filters? That would make it easier for future comparisons.
    
      - Could you include a standalone mini benchmark similar to the one you 
did where you posted the numbers here? A simple standalone Java executable that 
creates the hash table and feeds some generated records through it (with bloom 
filters activated and deactivated)? It would not start a full Flink cluster, 
but only test the HashJoin in isolation.
    We like to include some of those mini bechmarks for performance critical 
parts, and re-run them once in a while to determine how the performance behaves 
at that point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to