Re: How to run wordcount program on Multi-node Cluster.

P. Ramanjaneya Reddy Tue, 08 Aug 2017 03:19:04 -0700

Yes Felix,

I have created input and output files in HDFS.
http://localhost:50070/explorer.html#/user



But how we can access it ?

bin/flink run ./examples/streaming/WordCount.jar --input
*hdfs:///user/wordcount_input.txt* --output
*hdfs:///user/wordcount_output.txt *



On Tue, Aug 8, 2017 at 2:55 PM, Felix Neutatz <neut...@googlemail.com>
wrote:

> Hi,
>
> like Timo said e.g. you need a distributed filesystem like HDFS.
>
> Best regards,
> Felix
>
> On Aug 8, 2017 09:01, "P. Ramanjaneya Reddy" <ramanji...@gmail.com> wrote:
>
> Hi Timo,
>
> How to make access the files across TM?
>
> Thanks & Regards,
> Ramanji.
>
> On Mon, Aug 7, 2017 at 7:45 PM, Timo Walther <twal...@apache.org> wrote:
>
> > Flink is a distributed software for clusters. You need something like a
> > distributed file system. So that input file and output files can be
> > accessed from all nodes.
> >
> > Each TM has a log directory where the execution logs are stored.
> >
> > You can set additional properties to your output format by importing the
> > code in your IDE.
> >
> > Am 07.08.17 um 16:03 schrieb P. Ramanjaneya Reddy:
> >
> > Hi Timo,
> >> Problem is resolved after copy input file to all tasks managers.
> >>
> >> and where should generate outputfile? Is it in jobmanager or task
> manager?
> >>
> >> Where can i see the execution logs to understand how word count done
> each
> >> task manager?
> >>
> >>
> >> By the way any option to overwride...?
> >>
> >> 08/07/2017 19:27:00 Keyed Aggregation -> Sink: Unnamed(1/1) switched to
> >> FAILED
> >> java.io.IOException: File or directory already exists. Existing files
> and
> >> directories are not overwritten in NO_OVERWRITE mode. Use OVERWRITE mode
> >> to
> >> overwrite existing files and directories.
> >> at
> >> org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileS
> >> ystem.java:763)
> >> at
> >> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.initOutP
> >> athLocalFS(SafetyNetWrapperFileSystem.java:135)
> >> at
> >> org.apache.flink.api.common.io.FileOutputFormat.open(FileOut
> >> putFormat.java:231)
> >> at
> >> org.apache.flink.api.java.io.TextOutputFormat.open(TextOutpu
> >> tFormat.java:78)
> >> at
> >> org.apache.flink.streaming.api.functions.sink.OutputFormatSi
> >> nkFunction.open(OutputFormatSinkFunction.java:61)
> >> at
> >> org.apache.flink.api.common.functions.util.FunctionUtils.ope
> >> nFunction(FunctionUtils.java:36)
> >> at
> >> org.apache.flink.streaming.api.operators.AbstractUdfStreamOp
> >> erator.open(AbstractUdfStreamOperator.java:111)
> >> at
> >> org.apache.flink.streaming.runtime.tasks.StreamTask.openAllO
> >> perators(StreamTask.java:376)
> >> at
> >> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
> >> StreamTask.java:253)
> >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
> >> at java.lang.Thread.run(Thread.java:745)
> >>
> >>
> >> On Mon, Aug 7, 2017 at 6:49 PM, Timo Walther <twal...@apache.org>
> wrote:
> >>
> >> Make sure that the file exists and is accessible from all Flink tasks
> >>> managers.
> >>>
> >>>
> >>> Am 07.08.17 um 14:35 schrieb P. Ramanjaneya Reddy:
> >>>
> >>> Thank you Timo.
> >>>>
> >>>>
> >>>> root1@root1-HP-EliteBook-840-G2:~/NAI/Tools/BEAM/Flink_Clust
> >>>> er/rama/flink$
> >>>> *./bin/flink
> >>>> run ./examples/streaming/WordCount.jar --input
> >>>> file:///home/root1/hamlet.txt --output file:///home/root1/wordcount_o
> >>>> ut*
> >>>>
> >>>>
> >>>>
> >>>> Execution of worcountjar gives error...
> >>>>
> >>>> 08/07/2017 18:03:16 Source: Custom File Source(1/1) switched to FAILED
> >>>> java.io.FileNotFoundException: The provided file path
> >>>> file:/home/root1/hamlet.txt does not exist.
> >>>> at
> >>>> org.apache.flink.streaming.api.functions.source.ContinuousFi
> >>>> leMonitoringFunction.run(ContinuousFileMonitoringFunction.java:192)
> >>>> at
> >>>> org.apache.flink.streaming.api.operators.StreamSource.run(
> >>>> StreamSource.java:87)
> >>>> at
> >>>> org.apache.flink.streaming.api.operators.StreamSource.run(
> >>>> StreamSource.java:55)
> >>>> at
> >>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.
> >>>> run(SourceStreamTask.java:95)
> >>>> at
> >>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
> >>>> StreamTask.java:263)
> >>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
> >>>> at java.lang.Thread.run(Thread.java:748)
> >>>>
> >>>>
> >>>> On Mon, Aug 7, 2017 at 5:56 PM, Timo Walther <twal...@apache.org>
> >>>> wrote:
> >>>>
> >>>> Hi Ramanji,
> >>>>
> >>>>> you can find the source code of the examples here:
> >>>>> https://github.com/apache/flink/blob/master/flink-examples/
> >>>>> flink-examples-streaming/src/main/java/org/apache/flink/
> >>>>> streaming/examples/wordcount/WordCount.java
> >>>>>
> >>>>> A general introduction how the cluster execution works can be found
> >>>>> here:
> >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/
> >>>>> concepts/programming-model.html#programs-and-dataflows
> >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/
> >>>>> concepts/runtime.html
> >>>>>
> >>>>> It might also be helpful to have a look at the web interface which
> can
> >>>>> show you a nice graph of the job.
> >>>>>
> >>>>> I hope this helps. Feel free to ask further questions.
> >>>>>
> >>>>> Regards,
> >>>>> Timo
> >>>>>
> >>>>>
> >>>>> Am 07.08.17 um 14:00 schrieb P. Ramanjaneya Reddy:
> >>>>>
> >>>>> Hello Everyone,
> >>>>>
> >>>>> I have followed the steps specified below link to Install & Run
> Apache
> >>>>>> Flink on Multi-node Cluster.
> >>>>>>
> >>>>>> http://data-flair.training/blogs/install-run-deploy-flink-
> >>>>>> multi-node-cluster/
> >>>>>>     used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install
> >>>>>>
> >>>>>> using the command
> >>>>>>     " bin/flink run
> >>>>>> /home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples
> >>>>>> /streaming/WordCount.jar"
> >>>>>> able to run wordcount, but where can i see which input consider and
> >>>>>> output
> >>>>>> generated?
> >>>>>>
> >>>>>> and how can i specify the input and output paths?
> >>>>>>
> >>>>>> I'm trying to understand how the wordcount will work using
> Multi-node
> >>>>>> Cluster.?
> >>>>>>
> >>>>>> any suggestions will help me further understanding?
> >>>>>>
> >>>>>> Thanks & Regards,
> >>>>>> Ramanji.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >
>

Re: How to run wordcount program on Multi-node Cluster.

Reply via email to