Did you intentionally post to the mailing list? I'm investigating the issue. So far, I found that the hostname has never been passed to the input split assigner. I guess this issue was introduced by the recent jobmanager changes. And secondly, Flink is using the fully qualified hostname, whereas HDFS is using the hostname only. This caused a string-mismatch.
I wouln't cancel the release because we are at a point where it is faster to vote a bugfix release. The issue is not a show stopper for using flink. Its just slow on large datasets. On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <[email protected]> wrote: > This is a critical issue and sounds bit like a release blocker for 0.7 to > me. > > Other opinions? > > 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[email protected]>: > > > Robert Metzger created FLINK-1170: > > ------------------------------------- > > > > Summary: Localization of InputSplits is not working properly > > Key: FLINK-1170 > > URL: https://issues.apache.org/jira/browse/FLINK-1170 > > Project: Flink > > Issue Type: Bug > > Components: Distributed Runtime > > Reporter: Robert Metzger > > Assignee: Robert Metzger > > > > > > While running some benchmarks, I found that Flink is not properly > > assigning the InputSplits. > > > > On my testing cluster, ALL splits were assigned to remote HDFS DataNodes, > > which causes a lot of network I/O. > > > > > > > > -- > > This message was sent by Atlassian JIRA > > (v6.3.4#6332) > > >
