[ 
https://issues.apache.org/jira/browse/FLINK-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske resolved FLINK-1287.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 0.8-incubating

Fixed with e0a4ee07084bc6ab56a20fbc4a18863462da93eb

> Improve File Input Split assignment
> -----------------------------------
>
>                 Key: FLINK-1287
>                 URL: https://issues.apache.org/jira/browse/FLINK-1287
>             Project: Flink
>          Issue Type: Improvement
>          Components: Local Runtime
>            Reporter: Robert Metzger
>            Assignee: Fabian Hueske
>             Fix For: 0.8-incubating
>
>
> While running some DFS read-intensive benchmarks, I found that the assignment 
> of input splits is not optimal. In particular in cases where the numWorker != 
> numDataNodes and when the replication factor is low (in my case it was 1).
> In the particular example, the input had 40960 splits, of which 4694 were 
> read remotely.  Spark did only 2056 remote reads for the same dataset.
> With the replication factor increased to 2, Flink did only 290 remote reads. 
> So usually, users shouldn't be affected by this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to