I agree, we should cancel the release, fix this, and make a new release candidate.
Stephan On Fri, Oct 17, 2014 at 12:11 PM, Fabian Hueske <[email protected]> wrote: > Yes, that was intentionally. > > The whole point of using a parallel engine is to process large datasets. > Otherwise you could do it in Python on a single box... > Remote reads will severely impact the performance and might cause > significant performance regression. > > 2014-10-17 12:04 GMT+02:00 Robert Metzger <[email protected]>: > > > Did you intentionally post to the mailing list? > > > > I'm investigating the issue. > > So far, I found that the hostname has never been passed to the input > split > > assigner. I guess this issue was introduced by the recent jobmanager > > changes. > > And secondly, Flink is using the fully qualified hostname, whereas HDFS > is > > using the hostname only. This caused a string-mismatch. > > > > I wouln't cancel the release because we are at a point where it is faster > > to vote a bugfix release. > > The issue is not a show stopper for using flink. Its just slow on large > > datasets. > > > > On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <[email protected]> > > wrote: > > > > > This is a critical issue and sounds bit like a release blocker for 0.7 > to > > > me. > > > > > > Other opinions? > > > > > > 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[email protected]>: > > > > > > > Robert Metzger created FLINK-1170: > > > > ------------------------------------- > > > > > > > > Summary: Localization of InputSplits is not working > > properly > > > > Key: FLINK-1170 > > > > URL: > https://issues.apache.org/jira/browse/FLINK-1170 > > > > Project: Flink > > > > Issue Type: Bug > > > > Components: Distributed Runtime > > > > Reporter: Robert Metzger > > > > Assignee: Robert Metzger > > > > > > > > > > > > While running some benchmarks, I found that Flink is not properly > > > > assigning the InputSplits. > > > > > > > > On my testing cluster, ALL splits were assigned to remote HDFS > > DataNodes, > > > > which causes a lot of network I/O. > > > > > > > > > > > > > > > > -- > > > > This message was sent by Atlassian JIRA > > > > (v6.3.4#6332) > > > > > > > > > >
