I agree with Fabian on this. Let's cancel the release and create a new RC.

On 17 Oct 2014, at 12:11, Fabian Hueske <[email protected]> wrote:

> Yes, that was intentionally.
> 
> The whole point of using a parallel engine is to process large datasets.
> Otherwise you could do it in Python on a single box...
> Remote reads will severely impact the performance and might cause
> significant performance regression.
> 
> 2014-10-17 12:04 GMT+02:00 Robert Metzger <[email protected]>:
> 
>> Did you intentionally post to the mailing list?
>> 
>> I'm investigating the issue.
>> So far, I found that the hostname has never been passed to the input split
>> assigner. I guess this issue was introduced by the recent jobmanager
>> changes.
>> And secondly, Flink is using the fully qualified hostname, whereas HDFS is
>> using the hostname only. This caused a string-mismatch.
>> 
>> I wouln't cancel the release because we are at a point where it is faster
>> to vote a bugfix release.
>> The issue is not a show stopper for using flink. Its just slow on large
>> datasets.
>> 
>> On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <[email protected]>
>> wrote:
>> 
>>> This is a critical issue and sounds bit like a release blocker for 0.7 to
>>> me.
>>> 
>>> Other opinions?
>>> 
>>> 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[email protected]>:
>>> 
>>>> Robert Metzger created FLINK-1170:
>>>> -------------------------------------
>>>> 
>>>>             Summary: Localization of InputSplits is not working
>> properly
>>>>                 Key: FLINK-1170
>>>>                 URL: https://issues.apache.org/jira/browse/FLINK-1170
>>>>             Project: Flink
>>>>          Issue Type: Bug
>>>>          Components: Distributed Runtime
>>>>            Reporter: Robert Metzger
>>>>            Assignee: Robert Metzger
>>>> 
>>>> 
>>>> While running some benchmarks, I found that Flink is not properly
>>>> assigning the InputSplits.
>>>> 
>>>> On my testing cluster, ALL splits were assigned to remote HDFS
>> DataNodes,
>>>> which causes a lot of network I/O.
>>>> 
>>>> 
>>>> 
>>>> --
>>>> This message was sent by Atlassian JIRA
>>>> (v6.3.4#6332)
>>>> 
>>> 
>> 

Reply via email to