[ 
https://issues.apache.org/jira/browse/FLINK-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595725#comment-14595725
 ] 

Stephan Ewen commented on FLINK-2193:
-------------------------------------

I think there is currently no partition function that can access the runtime 
context.

For now, this seems a bit like a very special case. Can you simulate this by 
computing the target partition in a {{RichMapFunction}} and then use an 
identity-Partitioner to decide the target channel?

One issue that you will encounter is that the i-th receiver has no affinity to 
the location of the i-th sender, because to the system it look like every 
sender task talks with every receiver task, and it does not know that you 
intend to send 95% of all data through one channel.

> Partial shuffling
> -----------------
>
>                 Key: FLINK-2193
>                 URL: https://issues.apache.org/jira/browse/FLINK-2193
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Sebastian Kruse
>            Priority: Minor
>
> In some cases, it would come in handy to shuffle only some specific elements 
> of a dataset instead of all elements. This is currently not achievable with a 
> custom partitioner.
> Use cases for such a feature are:
> * Load balancing: split up elements that require high processing load and 
> distribute the splits among all task managers.
> * Evolutionary algorithms: A well-suited EA model for Map/Reduce-like 
> platforms is the island model, where each worker maintains and evolves its 
> own population. From time to time, individuals among the population need to 
> be exchanged. Shuffling all the complete populations is not necessary, though.
> A presumably easy way to achieve this feature could be to provide the local 
> partition number in deployed partitioners, similar to 
> {{RichFunction#getRuntimeContext()#getIndexOfThisSubtask()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to