Re: How to run wordcount program on Multi-node Cluster.

Felix Neutatz Tue, 08 Aug 2017 02:26:16 -0700

Hi,

like Timo said e.g. you need a distributed filesystem like HDFS.


Best regards,
Felix

On Aug 8, 2017 09:01, "P. Ramanjaneya Reddy" <ramanji...@gmail.com> wrote:

Hi Timo,

How to make access the files across TM?

Thanks & Regards,
Ramanji.

On Mon, Aug 7, 2017 at 7:45 PM, Timo Walther <twal...@apache.org> wrote:

> Flink is a distributed software for clusters. You need something like a
> distributed file system. So that input file and output files can be
> accessed from all nodes.
>
> Each TM has a log directory where the execution logs are stored.
>
> You can set additional properties to your output format by importing the
> code in your IDE.
>
> Am 07.08.17 um 16:03 schrieb P. Ramanjaneya Reddy:
>
> Hi Timo,
>> Problem is resolved after copy input file to all tasks managers.
>>
>> and where should generate outputfile? Is it in jobmanager or task
manager?
>>
>> Where can i see the execution logs to understand how word count done each
>> task manager?
>>
>>
>> By the way any option to overwride...?
>>
>> 08/07/2017 19:27:00 Keyed Aggregation -> Sink: Unnamed(1/1) switched to
>> FAILED
>> java.io.IOException: File or directory already exists. Existing files and
>> directories are not overwritten in NO_OVERWRITE mode. Use OVERWRITE mode
>> to
>> overwrite existing files and directories.
>> at
>> org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileS
>> ystem.java:763)
>> at
>> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.initOutP
>> athLocalFS(SafetyNetWrapperFileSystem.java:135)
>> at
>> org.apache.flink.api.common.io.FileOutputFormat.open(FileOut
>> putFormat.java:231)
>> at
>> org.apache.flink.api.java.io.TextOutputFormat.open(TextOutpu
>> tFormat.java:78)
>> at
>> org.apache.flink.streaming.api.functions.sink.OutputFormatSi
>> nkFunction.open(OutputFormatSinkFunction.java:61)
>> at
>> org.apache.flink.api.common.functions.util.FunctionUtils.ope
>> nFunction(FunctionUtils.java:36)
>> at
>> org.apache.flink.streaming.api.operators.AbstractUdfStreamOp
>> erator.open(AbstractUdfStreamOperator.java:111)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.openAllO
>> perators(StreamTask.java:376)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
>> StreamTask.java:253)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>> On Mon, Aug 7, 2017 at 6:49 PM, Timo Walther <twal...@apache.org> wrote:
>>
>> Make sure that the file exists and is accessible from all Flink tasks
>>> managers.
>>>
>>>
>>> Am 07.08.17 um 14:35 schrieb P. Ramanjaneya Reddy:
>>>
>>> Thank you Timo.
>>>>
>>>>
>>>> root1@root1-HP-EliteBook-840-G2:~/NAI/Tools/BEAM/Flink_Clust
>>>> er/rama/flink$
>>>> *./bin/flink
>>>> run ./examples/streaming/WordCount.jar --input
>>>> file:///home/root1/hamlet.txt --output file:///home/root1/wordcount_o
>>>> ut*
>>>>
>>>>
>>>>
>>>> Execution of worcountjar gives error...
>>>>
>>>> 08/07/2017 18:03:16 Source: Custom File Source(1/1) switched to FAILED
>>>> java.io.FileNotFoundException: The provided file path
>>>> file:/home/root1/hamlet.txt does not exist.
>>>> at
>>>> org.apache.flink.streaming.api.functions.source.ContinuousFi
>>>> leMonitoringFunction.run(ContinuousFileMonitoringFunction.java:192)
>>>> at
>>>> org.apache.flink.streaming.api.operators.StreamSource.run(
>>>> StreamSource.java:87)
>>>> at
>>>> org.apache.flink.streaming.api.operators.StreamSource.run(
>>>> StreamSource.java:55)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.
>>>> run(SourceStreamTask.java:95)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
>>>> StreamTask.java:263)
>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
>>>> at java.lang.Thread.run(Thread.java:748)
>>>>
>>>>
>>>> On Mon, Aug 7, 2017 at 5:56 PM, Timo Walther <twal...@apache.org>
>>>> wrote:
>>>>
>>>> Hi Ramanji,
>>>>
>>>>> you can find the source code of the examples here:
>>>>> https://github.com/apache/flink/blob/master/flink-examples/
>>>>> flink-examples-streaming/src/main/java/org/apache/flink/
>>>>> streaming/examples/wordcount/WordCount.java
>>>>>
>>>>> A general introduction how the cluster execution works can be found
>>>>> here:
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/
>>>>> concepts/programming-model.html#programs-and-dataflows
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/
>>>>> concepts/runtime.html
>>>>>
>>>>> It might also be helpful to have a look at the web interface which can
>>>>> show you a nice graph of the job.
>>>>>
>>>>> I hope this helps. Feel free to ask further questions.
>>>>>
>>>>> Regards,
>>>>> Timo
>>>>>
>>>>>
>>>>> Am 07.08.17 um 14:00 schrieb P. Ramanjaneya Reddy:
>>>>>
>>>>> Hello Everyone,
>>>>>
>>>>> I have followed the steps specified below link to Install & Run Apache
>>>>>> Flink on Multi-node Cluster.
>>>>>>
>>>>>> http://data-flair.training/blogs/install-run-deploy-flink-
>>>>>> multi-node-cluster/
>>>>>>     used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install
>>>>>>
>>>>>> using the command
>>>>>>     " bin/flink run
>>>>>> /home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples
>>>>>> /streaming/WordCount.jar"
>>>>>> able to run wordcount, but where can i see which input consider and
>>>>>> output
>>>>>> generated?
>>>>>>
>>>>>> and how can i specify the input and output paths?
>>>>>>
>>>>>> I'm trying to understand how the wordcount will work using Multi-node
>>>>>> Cluster.?
>>>>>>
>>>>>> any suggestions will help me further understanding?
>>>>>>
>>>>>> Thanks & Regards,
>>>>>> Ramanji.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>

Re: How to run wordcount program on Multi-node Cluster.

Reply via email to