[jira] [Commented] (FLINK-1060) Support explicit shuffling of DataSets

ASF GitHub Bot (JIRA) Mon, 01 Sep 2014 10:05:43 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117564#comment-14117564
 ]


ASF GitHub Bot commented on FLINK-1060:
---------------------------------------

Github user rmetzger commented on the pull request:

    https://github.com/apache/incubator-flink/pull/108#issuecomment-54079208
  
    I successfully looked superficially over the code ;)


> Support explicit shuffling of DataSets
> --------------------------------------
>
>                 Key: FLINK-1060
>                 URL: https://issues.apache.org/jira/browse/FLINK-1060
>             Project: Flink
>          Issue Type: Improvement
>          Components: Java API, Optimizer
>            Reporter: Fabian Hueske
>            Assignee: Fabian Hueske
>            Priority: Minor
>
> Right now, Flink only shuffles data if it is required by some operation such 
> as Reduce, Join, or CoGroup. There is no way to explicitly shuffle a data set.
> However, in some situations explicit shuffling would be very helpful 
> including:
> - rebalancing before compute-intensive Map operations
> - balancing, random or hash partitioning before PartitionMap operations (see 
> FLINK-1053)
> - better integration of support for HadoopJobs (see FLINK-838)
> With this issue, I propose to add the following methods to {{DataSet}}
> - {{DataSet.partitionHashBy(int...)}} and 
> {{DataSet.partitionHashBy(KeySelector)}} to perform an explicit hash 
> partitioning
> - {{DataSet.partitionRandomly()}} to shuffle data completely random
> - {{DataSet.partitionRoundRobin()}} to shuffle data in a round-robin fashion 
> that generates very even distribution with possible bias due to prior 
> distributions
> The {{DataSet.partitionRoundRobin()}} might not be necessary if we think that 
> random shuffling balances good enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1060) Support explicit shuffling of DataSets

Reply via email to