Re: How to run wordcount program on Multi-node Cluster.

Timo Walther Mon, 07 Aug 2017 07:15:39 -0700

Flink is a distributed software for clusters. You need something like adistributed file system. So that input file and output files can beaccessed from all nodes.


Each TM has a log directory where the execution logs are stored.

You can set additional properties to your output format by importing thecode in your IDE.


Am 07.08.17 um 16:03 schrieb P. Ramanjaneya Reddy:

Hi Timo,
Problem is resolved after copy input file to all tasks managers.

and where should generate outputfile? Is it in jobmanager or task manager?

Where can i see the execution logs to understand how word count done each
task manager?


By the way any option to overwride...?

08/07/2017 19:27:00 Keyed Aggregation -> Sink: Unnamed(1/1) switched to
FAILED
java.io.IOException: File or directory already exists. Existing files and
directories are not overwritten in NO_OVERWRITE mode. Use OVERWRITE mode to
overwrite existing files and directories.
at
org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileSystem.java:763)
at
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.initOutPathLocalFS(SafetyNetWrapperFileSystem.java:135)
at
org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:231)
at
org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:78)
at
org.apache.flink.streaming.api.functions.sink.OutputFormatSinkFunction.open(OutputFormatSinkFunction.java:61)
at
org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:111)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:376)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
at java.lang.Thread.run(Thread.java:745)


On Mon, Aug 7, 2017 at 6:49 PM, Timo Walther <[email protected]> wrote:

Make sure that the file exists and is accessible from all Flink tasks
managers.


Am 07.08.17 um 14:35 schrieb P. Ramanjaneya Reddy:

Thank you Timo.


root1@root1-HP-EliteBook-840-G2:~/NAI/Tools/BEAM/Flink_Clust
er/rama/flink$
*./bin/flink
run ./examples/streaming/WordCount.jar --input
file:///home/root1/hamlet.txt --output file:///home/root1/wordcount_out*



Execution of worcountjar gives error...

08/07/2017 18:03:16 Source: Custom File Source(1/1) switched to FAILED
java.io.FileNotFoundException: The provided file path
file:/home/root1/hamlet.txt does not exist.
at
org.apache.flink.streaming.api.functions.source.ContinuousFi
leMonitoringFunction.run(ContinuousFileMonitoringFunction.java:192)
at
org.apache.flink.streaming.api.operators.StreamSource.run(
StreamSource.java:87)
at
org.apache.flink.streaming.api.operators.StreamSource.run(
StreamSource.java:55)
at
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.
run(SourceStreamTask.java:95)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
StreamTask.java:263)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
at java.lang.Thread.run(Thread.java:748)


On Mon, Aug 7, 2017 at 5:56 PM, Timo Walther <[email protected]> wrote:

Hi Ramanji,

you can find the source code of the examples here:
https://github.com/apache/flink/blob/master/flink-examples/
flink-examples-streaming/src/main/java/org/apache/flink/
streaming/examples/wordcount/WordCount.java

A general introduction how the cluster execution works can be found here:
https://ci.apache.org/projects/flink/flink-docs-release-1.4/
concepts/programming-model.html#programs-and-dataflows
https://ci.apache.org/projects/flink/flink-docs-release-1.4/
concepts/runtime.html

It might also be helpful to have a look at the web interface which can
show you a nice graph of the job.

I hope this helps. Feel free to ask further questions.

Regards,
Timo


Am 07.08.17 um 14:00 schrieb P. Ramanjaneya Reddy:

Hello Everyone,

I have followed the steps specified below link to Install & Run Apache
Flink on Multi-node Cluster.

http://data-flair.training/blogs/install-run-deploy-flink-
multi-node-cluster/
    used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install

using the command
    " bin/flink run
/home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples
/streaming/WordCount.jar"
able to run wordcount, but where can i see which input consider and
output
generated?

and how can i specify the input and output paths?

I'm trying to understand how the wordcount will work using Multi-node
Cluster.?

any suggestions will help me further understanding?

Thanks & Regards,
Ramanji.

Re: How to run wordcount program on Multi-node Cluster.

Reply via email to