Yes, that was intentionally.

The whole point of using a parallel engine is to process large datasets.
Otherwise you could do it in Python on a single box...
Remote reads will severely impact the performance and might cause
significant performance regression.

2014-10-17 12:04 GMT+02:00 Robert Metzger <[email protected]>:

> Did you intentionally post to the mailing list?
>
> I'm investigating the issue.
> So far, I found that the hostname has never been passed to the input split
> assigner. I guess this issue was introduced by the recent jobmanager
> changes.
> And secondly, Flink is using the fully qualified hostname, whereas HDFS is
> using the hostname only. This caused a string-mismatch.
>
> I wouln't cancel the release because we are at a point where it is faster
> to vote a bugfix release.
> The issue is not a show stopper for using flink. Its just slow on large
> datasets.
>
> On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <[email protected]>
> wrote:
>
> > This is a critical issue and sounds bit like a release blocker for 0.7 to
> > me.
> >
> > Other opinions?
> >
> > 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[email protected]>:
> >
> > > Robert Metzger created FLINK-1170:
> > > -------------------------------------
> > >
> > >              Summary: Localization of InputSplits is not working
> properly
> > >                  Key: FLINK-1170
> > >                  URL: https://issues.apache.org/jira/browse/FLINK-1170
> > >              Project: Flink
> > >           Issue Type: Bug
> > >           Components: Distributed Runtime
> > >             Reporter: Robert Metzger
> > >             Assignee: Robert Metzger
> > >
> > >
> > > While running some benchmarks, I found that Flink is not properly
> > > assigning the InputSplits.
> > >
> > > On my testing cluster, ALL splits were assigned to remote HDFS
> DataNodes,
> > > which causes a lot of network I/O.
> > >
> > >
> > >
> > > --
> > > This message was sent by Atlassian JIRA
> > > (v6.3.4#6332)
> > >
> >
>

Reply via email to