Basically I have added input and outputfiles to HDFS. But how to specify access the input file in commandline to run wordcount program?
bin/flink run ./examples/streaming/WordCount.jar --input *hdfs:///user/wordcount_input.txt* --output *hdfs:///user/wordcount_output.txt * On Tue, Aug 8, 2017 at 3:48 PM, P. Ramanjaneya Reddy <ramanji...@gmail.com> wrote: > Yes Felix, > > I have created input and output files in HDFS. > http://localhost:50070/explorer.html#/user > > > But how we can access it ? > > bin/flink run ./examples/streaming/WordCount.jar --input > *hdfs:///user/wordcount_input.txt* --output > *hdfs:///user/wordcount_output.txt * > > > > On Tue, Aug 8, 2017 at 2:55 PM, Felix Neutatz <neut...@googlemail.com> > wrote: > >> Hi, >> >> like Timo said e.g. you need a distributed filesystem like HDFS. >> >> Best regards, >> Felix >> >> On Aug 8, 2017 09:01, "P. Ramanjaneya Reddy" <ramanji...@gmail.com> >> wrote: >> >> Hi Timo, >> >> How to make access the files across TM? >> >> Thanks & Regards, >> Ramanji. >> >> On Mon, Aug 7, 2017 at 7:45 PM, Timo Walther <twal...@apache.org> wrote: >> >> > Flink is a distributed software for clusters. You need something like a >> > distributed file system. So that input file and output files can be >> > accessed from all nodes. >> > >> > Each TM has a log directory where the execution logs are stored. >> > >> > You can set additional properties to your output format by importing the >> > code in your IDE. >> > >> > Am 07.08.17 um 16:03 schrieb P. Ramanjaneya Reddy: >> > >> > Hi Timo, >> >> Problem is resolved after copy input file to all tasks managers. >> >> >> >> and where should generate outputfile? Is it in jobmanager or task >> manager? >> >> >> >> Where can i see the execution logs to understand how word count done >> each >> >> task manager? >> >> >> >> >> >> By the way any option to overwride...? >> >> >> >> 08/07/2017 19:27:00 Keyed Aggregation -> Sink: Unnamed(1/1) switched to >> >> FAILED >> >> java.io.IOException: File or directory already exists. Existing files >> and >> >> directories are not overwritten in NO_OVERWRITE mode. Use OVERWRITE >> mode >> >> to >> >> overwrite existing files and directories. >> >> at >> >> org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileS >> >> ystem.java:763) >> >> at >> >> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.initOutP >> >> athLocalFS(SafetyNetWrapperFileSystem.java:135) >> >> at >> >> org.apache.flink.api.common.io.FileOutputFormat.open(FileOut >> >> putFormat.java:231) >> >> at >> >> org.apache.flink.api.java.io.TextOutputFormat.open(TextOutpu >> >> tFormat.java:78) >> >> at >> >> org.apache.flink.streaming.api.functions.sink.OutputFormatSi >> >> nkFunction.open(OutputFormatSinkFunction.java:61) >> >> at >> >> org.apache.flink.api.common.functions.util.FunctionUtils.ope >> >> nFunction(FunctionUtils.java:36) >> >> at >> >> org.apache.flink.streaming.api.operators.AbstractUdfStreamOp >> >> erator.open(AbstractUdfStreamOperator.java:111) >> >> at >> >> org.apache.flink.streaming.runtime.tasks.StreamTask.openAllO >> >> perators(StreamTask.java:376) >> >> at >> >> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke( >> >> StreamTask.java:253) >> >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) >> >> at java.lang.Thread.run(Thread.java:745) >> >> >> >> >> >> On Mon, Aug 7, 2017 at 6:49 PM, Timo Walther <twal...@apache.org> >> wrote: >> >> >> >> Make sure that the file exists and is accessible from all Flink tasks >> >>> managers. >> >>> >> >>> >> >>> Am 07.08.17 um 14:35 schrieb P. Ramanjaneya Reddy: >> >>> >> >>> Thank you Timo. >> >>>> >> >>>> >> >>>> root1@root1-HP-EliteBook-840-G2:~/NAI/Tools/BEAM/Flink_Clust >> >>>> er/rama/flink$ >> >>>> *./bin/flink >> >>>> run ./examples/streaming/WordCount.jar --input >> >>>> file:///home/root1/hamlet.txt --output file:///home/root1/wordcount_o >> >>>> ut* >> >>>> >> >>>> >> >>>> >> >>>> Execution of worcountjar gives error... >> >>>> >> >>>> 08/07/2017 18:03:16 Source: Custom File Source(1/1) switched to >> FAILED >> >>>> java.io.FileNotFoundException: The provided file path >> >>>> file:/home/root1/hamlet.txt does not exist. >> >>>> at >> >>>> org.apache.flink.streaming.api.functions.source.ContinuousFi >> >>>> leMonitoringFunction.run(ContinuousFileMonitoringFunction.java:192) >> >>>> at >> >>>> org.apache.flink.streaming.api.operators.StreamSource.run( >> >>>> StreamSource.java:87) >> >>>> at >> >>>> org.apache.flink.streaming.api.operators.StreamSource.run( >> >>>> StreamSource.java:55) >> >>>> at >> >>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask. >> >>>> run(SourceStreamTask.java:95) >> >>>> at >> >>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke( >> >>>> StreamTask.java:263) >> >>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) >> >>>> at java.lang.Thread.run(Thread.java:748) >> >>>> >> >>>> >> >>>> On Mon, Aug 7, 2017 at 5:56 PM, Timo Walther <twal...@apache.org> >> >>>> wrote: >> >>>> >> >>>> Hi Ramanji, >> >>>> >> >>>>> you can find the source code of the examples here: >> >>>>> https://github.com/apache/flink/blob/master/flink-examples/ >> >>>>> flink-examples-streaming/src/main/java/org/apache/flink/ >> >>>>> streaming/examples/wordcount/WordCount.java >> >>>>> >> >>>>> A general introduction how the cluster execution works can be found >> >>>>> here: >> >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ >> >>>>> concepts/programming-model.html#programs-and-dataflows >> >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ >> >>>>> concepts/runtime.html >> >>>>> >> >>>>> It might also be helpful to have a look at the web interface which >> can >> >>>>> show you a nice graph of the job. >> >>>>> >> >>>>> I hope this helps. Feel free to ask further questions. >> >>>>> >> >>>>> Regards, >> >>>>> Timo >> >>>>> >> >>>>> >> >>>>> Am 07.08.17 um 14:00 schrieb P. Ramanjaneya Reddy: >> >>>>> >> >>>>> Hello Everyone, >> >>>>> >> >>>>> I have followed the steps specified below link to Install & Run >> Apache >> >>>>>> Flink on Multi-node Cluster. >> >>>>>> >> >>>>>> http://data-flair.training/blogs/install-run-deploy-flink- >> >>>>>> multi-node-cluster/ >> >>>>>> used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install >> >>>>>> >> >>>>>> using the command >> >>>>>> " bin/flink run >> >>>>>> /home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples >> >>>>>> /streaming/WordCount.jar" >> >>>>>> able to run wordcount, but where can i see which input consider and >> >>>>>> output >> >>>>>> generated? >> >>>>>> >> >>>>>> and how can i specify the input and output paths? >> >>>>>> >> >>>>>> I'm trying to understand how the wordcount will work using >> Multi-node >> >>>>>> Cluster.? >> >>>>>> >> >>>>>> any suggestions will help me further understanding? >> >>>>>> >> >>>>>> Thanks & Regards, >> >>>>>> Ramanji. >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> > >> > >