Re: [jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

Robert Metzger Fri, 17 Oct 2014 06:09:00 -0700

Okay. I see the point.

I'll write on general@incubator to cancel the vote.


On Fri, Oct 17, 2014 at 1:03 PM, Ufuk Celebi <[email protected]> wrote:

> I agree with Fabian on this. Let's cancel the release and create a new RC.
>
> On 17 Oct 2014, at 12:11, Fabian Hueske <[email protected]> wrote:
>
> > Yes, that was intentionally.
> >
> > The whole point of using a parallel engine is to process large datasets.
> > Otherwise you could do it in Python on a single box...
> > Remote reads will severely impact the performance and might cause
> > significant performance regression.
> >
> > 2014-10-17 12:04 GMT+02:00 Robert Metzger <[email protected]>:
> >
> >> Did you intentionally post to the mailing list?
> >>
> >> I'm investigating the issue.
> >> So far, I found that the hostname has never been passed to the input
> split
> >> assigner. I guess this issue was introduced by the recent jobmanager
> >> changes.
> >> And secondly, Flink is using the fully qualified hostname, whereas HDFS
> is
> >> using the hostname only. This caused a string-mismatch.
> >>
> >> I wouln't cancel the release because we are at a point where it is
> faster
> >> to vote a bugfix release.
> >> The issue is not a show stopper for using flink. Its just slow on large
> >> datasets.
> >>
> >> On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <[email protected]>
> >> wrote:
> >>
> >>> This is a critical issue and sounds bit like a release blocker for 0.7
> to
> >>> me.
> >>>
> >>> Other opinions?
> >>>
> >>> 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[email protected]>:
> >>>
> >>>> Robert Metzger created FLINK-1170:
> >>>> -------------------------------------
> >>>>
> >>>>             Summary: Localization of InputSplits is not working
> >> properly
> >>>>                 Key: FLINK-1170
> >>>>                 URL: https://issues.apache.org/jira/browse/FLINK-1170
> >>>>             Project: Flink
> >>>>          Issue Type: Bug
> >>>>          Components: Distributed Runtime
> >>>>            Reporter: Robert Metzger
> >>>>            Assignee: Robert Metzger
> >>>>
> >>>>
> >>>> While running some benchmarks, I found that Flink is not properly
> >>>> assigning the InputSplits.
> >>>>
> >>>> On my testing cluster, ALL splits were assigned to remote HDFS
> >> DataNodes,
> >>>> which causes a lot of network I/O.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> This message was sent by Atlassian JIRA
> >>>> (v6.3.4#6332)
> >>>>
> >>>
> >>
>
>

Re: [jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

Reply via email to