[GitHub] flink pull request: [FLINK-1725]- New Partitioner for better load ...

anisnasir Fri, 28 Aug 2015 15:48:13 -0700

Github user anisnasir commented on the pull request:

    https://github.com/apache/flink/pull/1069#issuecomment-135905274
  
    @tillrohrmann you are absolutely right with your observation that high 
skews require more than two workers to process the most frequent keys. However, 
most of the real world datasets do not have high skews [1] and can be handled 
by just splitting keys into two components.
    
    I was planning to write a wordcount example with both HashPartitioner and 
PartialPartitioner. Can you explain a little more on how one could check that 
the skew of the input data decreases after the partitioning.
    
    [1]. 
https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1725]- New Partitioner for better load ...

Reply via email to