Sebastian Kruse created FLINK-2193:
--------------------------------------

             Summary: Partial shuffling
                 Key: FLINK-2193
                 URL: https://issues.apache.org/jira/browse/FLINK-2193
             Project: Flink
          Issue Type: Improvement
            Reporter: Sebastian Kruse
            Priority: Minor


In some cases, it would come in handy to shuffle only some specific elements of 
a dataset instead of all elements. This is currently not achievable with a 
custom partitioner.

Use cases for such a feature are:
* Load balancing: split up elements that require high processing load and 
distribute the splits among all task managers.
* Evolutionary algorithms: A well-suited EA model for Map/Reduce-like platforms 
is the island model, where each worker maintains and evolves its own 
population. From time to time, individuals among the population need to be 
exchanged. Shuffling all the complete populations is not necessary, though.

A presumably easy way to achieve this feature could be to provide the local 
partition number in deployed partitioners, similar to 
{{RichFunction#getRuntimeContext()#getIndexOfThisSubtask()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to