Re: How to run wordcount program on Multi-node Cluster.

P. Ramanjaneya Reddy Tue, 08 Aug 2017 04:30:07 -0700

Basically I have added input and outputfiles to HDFS.

But how to specify access the input file in commandline to run wordcount
program?


bin/flink run ./examples/streaming/WordCount.jar --input
*hdfs:///user/wordcount_input.txt* --output
*hdfs:///user/wordcount_output.txt *

On Tue, Aug 8, 2017 at 3:48 PM, P. Ramanjaneya Reddy <ramanji...@gmail.com>
wrote:

> Yes Felix,
>
> I have created input and output files in HDFS.
> http://localhost:50070/explorer.html#/user
>
>
> But how we can access it ?
>
> bin/flink run ./examples/streaming/WordCount.jar --input
> *hdfs:///user/wordcount_input.txt* --output
> *hdfs:///user/wordcount_output.txt *
>
>
>
> On Tue, Aug 8, 2017 at 2:55 PM, Felix Neutatz <neut...@googlemail.com>
> wrote:
>
>> Hi,
>>
>> like Timo said e.g. you need a distributed filesystem like HDFS.
>>
>> Best regards,
>> Felix
>>
>> On Aug 8, 2017 09:01, "P. Ramanjaneya Reddy" <ramanji...@gmail.com>
>> wrote:
>>
>> Hi Timo,
>>
>> How to make access the files across TM?
>>
>> Thanks & Regards,
>> Ramanji.
>>
>> On Mon, Aug 7, 2017 at 7:45 PM, Timo Walther <twal...@apache.org> wrote:
>>
>> > Flink is a distributed software for clusters. You need something like a
>> > distributed file system. So that input file and output files can be
>> > accessed from all nodes.
>> >
>> > Each TM has a log directory where the execution logs are stored.
>> >
>> > You can set additional properties to your output format by importing the
>> > code in your IDE.
>> >
>> > Am 07.08.17 um 16:03 schrieb P. Ramanjaneya Reddy:
>> >
>> > Hi Timo,
>> >> Problem is resolved after copy input file to all tasks managers.
>> >>
>> >> and where should generate outputfile? Is it in jobmanager or task
>> manager?
>> >>
>> >> Where can i see the execution logs to understand how word count done
>> each
>> >> task manager?
>> >>
>> >>
>> >> By the way any option to overwride...?
>> >>
>> >> 08/07/2017 19:27:00 Keyed Aggregation -> Sink: Unnamed(1/1) switched to
>> >> FAILED
>> >> java.io.IOException: File or directory already exists. Existing files
>> and
>> >> directories are not overwritten in NO_OVERWRITE mode. Use OVERWRITE
>> mode
>> >> to
>> >> overwrite existing files and directories.
>> >> at
>> >> org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileS
>> >> ystem.java:763)
>> >> at
>> >> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.initOutP
>> >> athLocalFS(SafetyNetWrapperFileSystem.java:135)
>> >> at
>> >> org.apache.flink.api.common.io.FileOutputFormat.open(FileOut
>> >> putFormat.java:231)
>> >> at
>> >> org.apache.flink.api.java.io.TextOutputFormat.open(TextOutpu
>> >> tFormat.java:78)
>> >> at
>> >> org.apache.flink.streaming.api.functions.sink.OutputFormatSi
>> >> nkFunction.open(OutputFormatSinkFunction.java:61)
>> >> at
>> >> org.apache.flink.api.common.functions.util.FunctionUtils.ope
>> >> nFunction(FunctionUtils.java:36)
>> >> at
>> >> org.apache.flink.streaming.api.operators.AbstractUdfStreamOp
>> >> erator.open(AbstractUdfStreamOperator.java:111)
>> >> at
>> >> org.apache.flink.streaming.runtime.tasks.StreamTask.openAllO
>> >> perators(StreamTask.java:376)
>> >> at
>> >> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
>> >> StreamTask.java:253)
>> >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
>> >> at java.lang.Thread.run(Thread.java:745)
>> >>
>> >>
>> >> On Mon, Aug 7, 2017 at 6:49 PM, Timo Walther <twal...@apache.org>
>> wrote:
>> >>
>> >> Make sure that the file exists and is accessible from all Flink tasks
>> >>> managers.
>> >>>
>> >>>
>> >>> Am 07.08.17 um 14:35 schrieb P. Ramanjaneya Reddy:
>> >>>
>> >>> Thank you Timo.
>> >>>>
>> >>>>
>> >>>> root1@root1-HP-EliteBook-840-G2:~/NAI/Tools/BEAM/Flink_Clust
>> >>>> er/rama/flink$
>> >>>> *./bin/flink
>> >>>> run ./examples/streaming/WordCount.jar --input
>> >>>> file:///home/root1/hamlet.txt --output file:///home/root1/wordcount_o
>> >>>> ut*
>> >>>>
>> >>>>
>> >>>>
>> >>>> Execution of worcountjar gives error...
>> >>>>
>> >>>> 08/07/2017 18:03:16 Source: Custom File Source(1/1) switched to
>> FAILED
>> >>>> java.io.FileNotFoundException: The provided file path
>> >>>> file:/home/root1/hamlet.txt does not exist.
>> >>>> at
>> >>>> org.apache.flink.streaming.api.functions.source.ContinuousFi
>> >>>> leMonitoringFunction.run(ContinuousFileMonitoringFunction.java:192)
>> >>>> at
>> >>>> org.apache.flink.streaming.api.operators.StreamSource.run(
>> >>>> StreamSource.java:87)
>> >>>> at
>> >>>> org.apache.flink.streaming.api.operators.StreamSource.run(
>> >>>> StreamSource.java:55)
>> >>>> at
>> >>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.
>> >>>> run(SourceStreamTask.java:95)
>> >>>> at
>> >>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
>> >>>> StreamTask.java:263)
>> >>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
>> >>>> at java.lang.Thread.run(Thread.java:748)
>> >>>>
>> >>>>
>> >>>> On Mon, Aug 7, 2017 at 5:56 PM, Timo Walther <twal...@apache.org>
>> >>>> wrote:
>> >>>>
>> >>>> Hi Ramanji,
>> >>>>
>> >>>>> you can find the source code of the examples here:
>> >>>>> https://github.com/apache/flink/blob/master/flink-examples/
>> >>>>> flink-examples-streaming/src/main/java/org/apache/flink/
>> >>>>> streaming/examples/wordcount/WordCount.java
>> >>>>>
>> >>>>> A general introduction how the cluster execution works can be found
>> >>>>> here:
>> >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/
>> >>>>> concepts/programming-model.html#programs-and-dataflows
>> >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/
>> >>>>> concepts/runtime.html
>> >>>>>
>> >>>>> It might also be helpful to have a look at the web interface which
>> can
>> >>>>> show you a nice graph of the job.
>> >>>>>
>> >>>>> I hope this helps. Feel free to ask further questions.
>> >>>>>
>> >>>>> Regards,
>> >>>>> Timo
>> >>>>>
>> >>>>>
>> >>>>> Am 07.08.17 um 14:00 schrieb P. Ramanjaneya Reddy:
>> >>>>>
>> >>>>> Hello Everyone,
>> >>>>>
>> >>>>> I have followed the steps specified below link to Install & Run
>> Apache
>> >>>>>> Flink on Multi-node Cluster.
>> >>>>>>
>> >>>>>> http://data-flair.training/blogs/install-run-deploy-flink-
>> >>>>>> multi-node-cluster/
>> >>>>>>     used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install
>> >>>>>>
>> >>>>>> using the command
>> >>>>>>     " bin/flink run
>> >>>>>> /home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples
>> >>>>>> /streaming/WordCount.jar"
>> >>>>>> able to run wordcount, but where can i see which input consider and
>> >>>>>> output
>> >>>>>> generated?
>> >>>>>>
>> >>>>>> and how can i specify the input and output paths?
>> >>>>>>
>> >>>>>> I'm trying to understand how the wordcount will work using
>> Multi-node
>> >>>>>> Cluster.?
>> >>>>>>
>> >>>>>> any suggestions will help me further understanding?
>> >>>>>>
>> >>>>>> Thanks & Regards,
>> >>>>>> Ramanji.
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >
>>
>
>

Re: How to run wordcount program on Multi-node Cluster.

Reply via email to