Hi all, thank you all for prompt replies. It is great to know, that there is so strong community support. Yes indeed, I don't use Hadoop yet. I wanted to try out Flink framework and then integrate it with Hadoop. I have read somewhere that Hadoop is not obligatory. I wonder, why the same program with the same configuration works fine for small files and this error appears only for the bigger ones. The example program "Word count" works always fine, so I suppose that there is my mistake somewhere behind.
-----Ursprüngliche Nachricht----- Von: Aljoscha Krettek [mailto:[email protected]] Gesendet: Sonntag, 29. Juni 2014 09:24 An: [email protected] Betreff: Re: Cluster execution of an example program ("Word count") and a problem related to the modificated example Hi Krzysztof, for the file acces problem: From the path it looks like you are accessing them as local files rather than as files in a distributed file system (HDFS is the default here). So one of the nodes can access the file because it is actually on the machine where the code is running while the other code executes on a machine where the file is not available. This explains how to setup hadoop with HDFS: http://hadoop.apache.org/docs/r1.2.1/cluster_setup.html . You only need to start HDFS, though, with "bin/start-dfs.sh". For accessing files inside HDFS from flink you would use a path such as "hdfs:///foo/bar" Please write again if you need more help. Aljoscha On Sat, Jun 28, 2014 at 10:57 PM, Ufuk Celebi <[email protected]> wrote: > > > On 28 Jun 2014, at 22:52, Stephan Ewen <[email protected]> wrote: > > > > Hey! > > > > You can always get the result in a single file, by setting the > parallelism > > of the sink task to one, for example line > > "result.writeAsText(path).parallelism(1)". > > Oh sure. I realized this after sending the mail. Thanks for pointing > it out. :) >
