Okay. I see the point. I'll write on general@incubator to cancel the vote.
On Fri, Oct 17, 2014 at 1:03 PM, Ufuk Celebi <[email protected]> wrote: > I agree with Fabian on this. Let's cancel the release and create a new RC. > > On 17 Oct 2014, at 12:11, Fabian Hueske <[email protected]> wrote: > > > Yes, that was intentionally. > > > > The whole point of using a parallel engine is to process large datasets. > > Otherwise you could do it in Python on a single box... > > Remote reads will severely impact the performance and might cause > > significant performance regression. > > > > 2014-10-17 12:04 GMT+02:00 Robert Metzger <[email protected]>: > > > >> Did you intentionally post to the mailing list? > >> > >> I'm investigating the issue. > >> So far, I found that the hostname has never been passed to the input > split > >> assigner. I guess this issue was introduced by the recent jobmanager > >> changes. > >> And secondly, Flink is using the fully qualified hostname, whereas HDFS > is > >> using the hostname only. This caused a string-mismatch. > >> > >> I wouln't cancel the release because we are at a point where it is > faster > >> to vote a bugfix release. > >> The issue is not a show stopper for using flink. Its just slow on large > >> datasets. > >> > >> On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <[email protected]> > >> wrote: > >> > >>> This is a critical issue and sounds bit like a release blocker for 0.7 > to > >>> me. > >>> > >>> Other opinions? > >>> > >>> 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[email protected]>: > >>> > >>>> Robert Metzger created FLINK-1170: > >>>> ------------------------------------- > >>>> > >>>> Summary: Localization of InputSplits is not working > >> properly > >>>> Key: FLINK-1170 > >>>> URL: https://issues.apache.org/jira/browse/FLINK-1170 > >>>> Project: Flink > >>>> Issue Type: Bug > >>>> Components: Distributed Runtime > >>>> Reporter: Robert Metzger > >>>> Assignee: Robert Metzger > >>>> > >>>> > >>>> While running some benchmarks, I found that Flink is not properly > >>>> assigning the InputSplits. > >>>> > >>>> On my testing cluster, ALL splits were assigned to remote HDFS > >> DataNodes, > >>>> which causes a lot of network I/O. > >>>> > >>>> > >>>> > >>>> -- > >>>> This message was sent by Atlassian JIRA > >>>> (v6.3.4#6332) > >>>> > >>> > >> > >
