Hi all,
thank you all for prompt replies. It is great to know, that there is so strong 
community support. Yes indeed, I don't use Hadoop yet. I wanted to try out 
Flink framework and then integrate it with Hadoop. I have read somewhere that 
Hadoop is not obligatory.
I wonder, why the same program with the same configuration works fine for small 
files and this error appears only for the bigger ones. The example program 
"Word count" works always fine, so I suppose that there is my mistake somewhere 
behind.


-----Ursprüngliche Nachricht-----
Von: Aljoscha Krettek [mailto:[email protected]] 
Gesendet: Sonntag, 29. Juni 2014 09:24
An: [email protected]
Betreff: Re: Cluster execution of an example program ("Word count") and a 
problem related to the modificated example

Hi Krzysztof,
for the file acces problem: From the path it looks like you are accessing them 
as local files rather than as files in a distributed file system (HDFS is the 
default here). So one of the nodes can access the file because it is actually 
on the machine where the code is running while the other code executes on a 
machine where the file is not available. This explains how to setup hadoop with 
HDFS:
http://hadoop.apache.org/docs/r1.2.1/cluster_setup.html . You only need to 
start HDFS, though,  with "bin/start-dfs.sh". For accessing files inside HDFS 
from flink you would use a path such as "hdfs:///foo/bar"

Please write again if you need more help.

Aljoscha


On Sat, Jun 28, 2014 at 10:57 PM, Ufuk Celebi <[email protected]> wrote:

>
> > On 28 Jun 2014, at 22:52, Stephan Ewen <[email protected]> wrote:
> >
> > Hey!
> >
> > You can always get the result in a single file, by setting the
> parallelism
> > of the sink task to one, for example line 
> > "result.writeAsText(path).parallelism(1)".
>
> Oh sure. I realized this after sending the mail. Thanks for pointing 
> it out. :)
>

Reply via email to