Robert Metzger created FLINK-1287:
-------------------------------------
Summary: Improve File Input Split assignment
Key: FLINK-1287
URL: https://issues.apache.org/jira/browse/FLINK-1287
Project: Flink
Issue Type: Improvement
Components: Local Runtime
Reporter: Robert Metzger
While running some DFS read-intensive benchmarks, I found that the assignment
of input splits is not optimal. In particular in cases where the numWorker !=
numDataNodes and when the replication factor is low (in my case it was 1).
In the particular example, the input had 40960 splits, of which 4694 were read
remotely. Spark did only 2056 remote reads for the same dataset.
With the replication factor increased to 2, Flink did only 290 remote reads. So
usually, users shouldn't be affected by this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)