Anis Nasir created FLINK-1725:
---------------------------------
Summary: New Partitioner for better load balancing for skewed data
Key: FLINK-1725
URL: https://issues.apache.org/jira/browse/FLINK-1725
Project: Flink
Issue Type: Improvement
Components: New Components
Affects Versions: 0.8.1
Reporter: Anis Nasir
Hi,
We have recently studied the problem of load balancing in Storm [1].
In particular, we focused on key distribution of the stream for skewed skewede
data.
We developed a new stream partitioning scheme (which we call Partial Key
Grouping). It achieves better load balancing than key grouping while being more
scalable than shuffle grouping in terms of memory.
In the paper we show a number of mining algorithms that are easy to implement
with partial key grouping, and whose performance can benefit from it. We think
that it might also be useful for a larger class of algorithms.
Partial key grouping is very easy to implement: it requires just a few lines of
code in Java when implemented as a custom grouping in Storm [2].
For all these reasons, we believe it will be a nice addition to the standard
Partitioners available in Flink. If the community thinks it's a good idea, we
will be happy to offer support in the porting.
References:
[1].
https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
[2]. https://github.com/gdfm/partial-key-grouping
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)