Here is the taskmanager log when I tried taskmanager.sh start flink-Vidura-taskmanager-localhost.log <https://gist.github.com/anonymous/aef5a0bf8722feee9b97#file-flink-vidura-taskmanager-localhost-log>
> On Feb 27, 2015, at 4:12 PM, Till Rohrmann <trohrm...@apache.org> wrote: > > It depends on how you started Flink. If you started a local cluster, then > the TaskManager log is contained in the JobManager log we just don't see > the respective log output in the snippet you posted. If you started a > TaskManager independently, either by taskmanager.sh or by start-cluster.sh, > then a file with the name format flink-<user>-taskmanager-<hostname>.log > should be created in flink/log/. If the Flink directory is not shared by > your cluster nodes, then you have to look on the machine on which you > started the TaskManager. > > But since the JobManager binds to 127.0.0.1 I guess that you started a > local cluster. Try whether you find some logging statements from the > logger org.apache.flink.runtime.taskmanager.TaskManager in your log. Maybe > you can upload the corresponding log file to [1] and post a link here. > > Greets, > > Till > > [1] https://gist.github.com/ > > On Thu, Feb 26, 2015 at 6:45 PM, Dulaj Viduranga <vidura...@icloud.com> > wrote: > >> Hi, >> Can you tell me where I can find TaskManager logs. I can’t find >> them in logs folder? I don’t suppose I should run taskmanager.sh as well. >> Right? >> I’m using a OS X Yosemite. I’ll send you my ifconfig. >> >> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 >> options=3<RXCSUM,TXCSUM> >> inet6 ::1 prefixlen 128 >> inet 127.0.0.1 netmask 0xff000000 >> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 >> nd6 options=1<PERFORMNUD> >> gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280 >> stf0: flags=0<> mtu 1280 >> en0: flags=8823<UP,BROADCAST,SMART,SIMPLEX,MULTICAST> mtu 1500 >> ether 60:03:08:a1:e0:f4 >> nd6 options=1<PERFORMNUD> >> media: autoselect (<unknown type>) >> status: inactive >> en1: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu >> 1500 >> options=60<TSO4,TSO6> >> ether 72:00:02:32:14:d0 >> media: autoselect <full-duplex> >> status: inactive >> en2: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu >> 1500 >> options=60<TSO4,TSO6> >> ether 72:00:02:32:14:d1 >> media: autoselect <full-duplex> >> status: inactive >> bridge0: flags=8822<BROADCAST,SMART,SIMPLEX,MULTICAST> mtu 1500 >> options=63<RXCSUM,TXCSUM,TSO4,TSO6> >> ether 62:03:08:1a:fa:00 >> Configuration: >> id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0 >> maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200 >> root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0 >> ipfilter disabled flags 0x2 >> member: en1 flags=3<LEARNING,DISCOVER> >> ifmaxaddr 0 port 5 priority 0 path cost 0 >> member: en2 flags=3<LEARNING,DISCOVER> >> ifmaxaddr 0 port 6 priority 0 path cost 0 >> media: <unknown type> >> status: inactive >> p2p0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 2304 >> ether 02:03:08:a1:e0:f4 >> media: autoselect >> status: inactive >> awdl0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1452 >> ether 06:56:3d:f6:60:08 >> nd6 options=1<PERFORMNUD> >> media: autoselect >> status: inactive >> ppp0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1500 >> inet 10.218.98.228 --> 10.64.64.64 netmask 0xff000000 >> utun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1380 >> inet6 fe80::b0d4:d4be:7e62:e730%utun0 prefixlen 64 scopeid 0xb >> inet6 fdd0:b291:7da7:9153:b0d4:d4be:7e62:e730 prefixlen 64 >> nd6 options=1<PERFORMNUD> >> >> >>> On Feb 26, 2015, at 10:48 PM, Stephan Ewen <se...@apache.org> wrote: >>> >>> Hi Dulaj! >>> >>> Thanks for helping to debug. >>> >>> My guess is that you are seeing now the same thing between JobManager and >>> TaskManager as you saw before between JobManager and JobClient. I have a >>> patch pending that should help the issue (see >>> https://issues.apache.org/jira/browse/FLINK-1608), let's see if that >> solves >>> it. >>> >>> What seems not right is that the JobManager initially accepted the >>> TaskManager and later the communication. Can you paste the TaskManager >> log >>> as well? >>> >>> Also: There must be something fairly unique about your network >>> configuration, as it works on all other setups that we use (locally, >> cloud, >>> test servers, YARN, ...). Can you paste your ipconfig / ifconfig by any >>> chance? >>> >>> Greetings, >>> Stephan >>> >>> >>> >>> On Thu, Feb 26, 2015 at 4:33 PM, Dulaj Viduranga <vidura...@icloud.com> >>> wrote: >>> >>>> Hi, >>>> It’s great to help out. :) >>>> >>>> Setting 127.0.0.1 instead of “localhost” in >>>> jobmanager.rpc.address, helped to build the connection to the >> jobmanager. >>>> Apparently localhost resolving is different in webclient and the >>>> jobmanager. I think it’s good to set "jobmanager.rpc.address: >> 127.0.0.1" in >>>> future builds. >>>> But then I get this error when I tried to run examples. I don’t >>>> know if I should move this issue to another thread. If so please tell >> me. >>>> >>>> bin/flink run >>>> >> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar >>>> >> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt >>>> $FLINK_DIRECTORY/count >>>> >>>> >>>> 20:46:21,998 WARN org.apache.hadoop.util.NativeCodeLoader >>>> - Unable to load native-hadoop library for your platform... using >>>> builtin-java classes where applicable >>>> 02/26/2015 20:46:23 Job execution switched to status RUNNING. >>>> 02/26/2015 20:46:23 CHAIN DataSource (at >>>> getTextDataSet(WordCount.java:141) >>>> (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at >>>> main(WordCount.java:69)) -> Combine(SUM(1), at >> main(WordCount.java:72)(1/1) >>>> switched to SCHEDULED >>>> 02/26/2015 20:46:23 CHAIN DataSource (at >>>> getTextDataSet(WordCount.java:141) >>>> (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at >>>> main(WordCount.java:69)) -> Combine(SUM(1), at >> main(WordCount.java:72)(1/1) >>>> switched to DEPLOYING >>>> 02/26/2015 20:48:03 CHAIN DataSource (at >>>> getTextDataSet(WordCount.java:141) >>>> (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at >>>> main(WordCount.java:69)) -> Combine(SUM(1), at >> main(WordCount.java:72)(1/1) >>>> switched to FAILED >>>> akka.pattern.AskTimeoutException: Ask timed out on >>>> [Actor[akka://flink/user/taskmanager#-1628133761]] after [100000 ms] >>>> at >>>> >> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333) >>>> at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) >>>> at >>>> >> scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) >>>> at >>>> >> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) >>>> at >>>> >> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467) >>>> at >>>> >> akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419) >>>> at >>>> >> akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423) >>>> at >>>> akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) >>>> at java.lang.Thread.run(Thread.java:745) >>>> >>>> 02/26/2015 20:48:03 Job execution switched to status FAILING. >>>> 02/26/2015 20:48:03 Reduce (SUM(1), at main(WordCount.java:72)(1/1) >>>> switched to CANCELED >>>> 02/26/2015 20:48:03 DataSink(CsvOutputFormat (path: >>>> >> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/count, >>>> delimiter: ))(1/1) switched to CANCELED >>>> 02/26/2015 20:48:03 Job execution switched to status FAILED. >>>> org.apache.flink.client.program.ProgramInvocationException: The program >>>> execution failed. >>>> at org.apache.flink.client.program.Client.run(Client.java:344) >>>> at org.apache.flink.client.program.Client.run(Client.java:306) >>>> at org.apache.flink.client.program.Client.run(Client.java:300) >>>> at >>>> >> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) >>>> at >>>> >> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>> at >>>> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:483) >>>> at >>>> >> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) >>>> at >>>> >> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) >>>> at org.apache.flink.client.program.Client.run(Client.java:250) >>>> at >>>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371) >>>> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344) >>>> at >>>> >> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087) >>>> at >> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114) >>>> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job >>>> execution failed. >>>> at >>>> >> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:284) >>>> at >>>> >> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) >>>> at >>>> >> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) >>>> at >>>> >> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) >>>> at >>>> >> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37) >>>> at >>>> >> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) >>>> at >>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) >>>> at >>>> >> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) >>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465) >>>> at >>>> >> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:88) >>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) >>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487) >>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) >>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221) >>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231) >>>> at >>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>> at >>>> >> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>>> at >>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>> at >>>> >> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>> Caused by: akka.pattern.AskTimeoutException: Ask timed out on >>>> [Actor[akka://flink/user/taskmanager#-1628133761]] after [100000 ms] >>>> at >>>> >> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333) >>>> at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) >>>> at >>>> >> scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) >>>> at >>>> >> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) >>>> at >>>> >> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467) >>>> at >>>> >> akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419) >>>> at >>>> >> akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423) >>>> at >>>> akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) >>>> at java.lang.Thread.run(Thread.java:745) >>>> >>>> The exception above occurred while trying to run your command. >>>> >>>> >>>>> On Feb 26, 2015, at 12:46 AM, Stephan Ewen <se...@apache.org> wrote: >>>>> >>>>> Addition: To check whether a port is reachable, I think the easiest >> thing >>>>> is to try and connect with a telnet client and see if the connection is >>>>> refused. >>>>> >>>>> On Wed, Feb 25, 2015 at 8:15 PM, Stephan Ewen <se...@apache.org> >> wrote: >>>>> >>>>>> Okay, the problem seems to be that even though both the client and the >>>>>> jobmanager use "localhost" as the host name, they resolve this to >>>> different >>>>>> IP addresses: In one case 127.0.0.1 in the other case 10.216.177.146 >>>>>> >>>>>> Also, the 127.0.0.1 address cannot communicate to 10.216.177.146 >>>>>> apparently. >>>>>> >>>>>> Can you help us debug this by checking the following: >>>>>> >>>>>> - Can you try and set "jobmanager.rpc.address" to 127.0.0.1 and see if >>>>>> that solves it? >>>>>> - Can you try and set "jobmanager.rpc.address" to the other address >>>> (10.216.177.146 >>>>>> or so) and see if that solves it? >>>>>> - Can you do "start-cluster.sh", rather than "start-local.sh" and see >>>>>> whether the webfrontend displays that the TaskManager connects? >>>>>> - As a hard core test: Can you bring up the jobmanager, check where it >>>>>> connects (10.216.192.98:6123 or so) and see whether the port is >>>> reachable? >>>>>> >>>>>> We have recently updated how the Akka URLs are build, to work around a >>>>>> limitation in Akka. Seems that did not yet fully solve the issue. >>>>>> >>>>>> Thanks for helping us debug this, it is not the easiest immigration >>>>>> experience, but the outcome is probably extremely valuable for the >>>> project >>>>>> :-) >>>>>> >>>>>> Greetings, >>>>>> Stephan >>>>>> >>>>>> >>>>>> On Wed, Feb 25, 2015 at 4:03 PM, Dulaj Viduranga < >> vidura...@icloud.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> Sorry for the delay to reply on this issue. >>>>>>> the jobmanager.rpc.address is set to “localhost” already in >> conf.yaml. >>>>>>> This can’t be an issue because the job manager web interface works >> fine >>>>>>> which also runs on localhost >>>>>>> >>>>>>> bin/flink run <jar> doesn’t seem to work either. Let me send you my >>>>>>> command and the result in terminal. >>>>>>> >>>>>>> bin/flink run >>>>>>> >>>> >> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar >>>>>>> >>>> >> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt >>>>>>> $FLINK_DIRECTORY/count >>>>>>> >>>>>>> 20:32:16,442 WARN org.apache.hadoop.util.NativeCodeLoader >>>>>>> - Unable to load native-hadoop library for your platform... >> using >>>>>>> builtin-java classes where applicable >>>>>>> org.apache.flink.client.program.ProgramInvocationException: Could not >>>>>>> build up connection to JobManager. >>>>>>> at org.apache.flink.client.program.Client.run(Client.java:327) >>>>>>> at org.apache.flink.client.program.Client.run(Client.java:306) >>>>>>> at org.apache.flink.client.program.Client.run(Client.java:300) >>>>>>> at >>>>>>> >>>> >> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) >>>>>>> at >>>>>>> >>>> >> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>> at >>>>>>> >>>> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>>>>> at >>>>>>> >>>> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>>> at java.lang.reflect.Method.invoke(Method.java:483) >>>>>>> at >>>>>>> >>>> >> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) >>>>>>> at >>>>>>> >>>> >> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) >>>>>>> at org.apache.flink.client.program.Client.run(Client.java:250) >>>>>>> at >>>>>>> >>>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371) >>>>>>> at >> org.apache.flink.client.CliFrontend.run(CliFrontend.java:344) >>>>>>> at >>>>>>> >>>> >> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087) >>>>>>> at >>>> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114) >>>>>>> Caused by: java.io.IOException: JobManager at akka.tcp:// >>>>>>> flink@10.216.177.146:6123/user/jobmanager not reachable. Please make >>>>>>> sure that the JobManager is running and its port is reachable. >>>>>>> at >>>>>>> >>>> >> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:897) >>>>>>> at >>>>>>> >>>> >> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151) >>>>>>> at >>>>>>> >>>> >> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142) >>>>>>> at >>>>>>> >>>> >> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125) >>>>>>> at >>>>>>> >>>> >> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala) >>>>>>> at org.apache.flink.client.program.Client.run(Client.java:322) >>>>>>> ... 15 more >>>>>>> Caused by: java.util.concurrent.TimeoutException: Futures timed out >>>> after >>>>>>> [10000 milliseconds] >>>>>>> at >>>>>>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) >>>>>>> at >>>>>>> >> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) >>>>>>> at >>>>>>> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) >>>>>>> at >>>>>>> >>>> >> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) >>>>>>> at scala.concurrent.Await$.result(package.scala:107) >>>>>>> at >>>>>>> >>>> >> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:893) >>>>>>> ... 20 more >>>>>>> >>>>>>> The exception above occurred while trying to run your command. >>>>>>> >>>>>>> >>>>>>>> On Feb 25, 2015, at 1:29 AM, Stephan Ewen <se...@apache.org> wrote: >>>>>>>> >>>>>>>> BTW: Does still work if you enter "localhost" for >>>>>>> "jobmanager.rpc.address" >>>>>>>> in your flink-conf.yaml ? >>>>>>>> >>>>>>>> On Tue, Feb 24, 2015 at 7:50 PM, Stephan Ewen <se...@apache.org> >>>> wrote: >>>>>>>> >>>>>>>>> Hi! >>>>>>>>> >>>>>>>>> I think that this is a problem in the current master (probably in >>>> there >>>>>>>>> since a few days ago). I am fixing it... >>>>>>>>> >>>>>>>>> Thanks for reporting it! >>>>>>>>> >>>>>>>>> Stephan >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen <se...@apache.org> >>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Dulaj! >>>>>>>>>> >>>>>>>>>> The log suggests that the JobManager binds itself to the IP >>>>>>>>>> address 10.216.192.98 and the WebClient runs at 127.0.0.1 >>>>>>>>>> >>>>>>>>>> The 127.0.0.1 actor system cannot connect to the 10.216.192.98. >>>>>>>>>> >>>>>>>>>> Let me verify whether this is a quirk of your particular setup, >> or a >>>>>>> bug >>>>>>>>>> recently introduces in the 0.9-SNAPSHOT. >>>>>>>>>> >>>>>>>>>> Does the command line work for you? ("bin/flink run <jar>") >>>>>>>>>> >>>>>>>>>> taskmanager.numberOfTaskSlots: -1 is also okay, this will mean >> that >>>>>>> the >>>>>>>>>> default of '1' is used. >>>>>>>>>> >>>>>>>>>> Greetings, >>>>>>>>>> Stephan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga < >>>>>>> vidura...@icloud.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Is taskmanager.numberOfTaskSlots: -1 normal? >>>>>>>>>>> >>>>>>>>>>>> On Feb 24, 2015, at 9:44 PM, Robert Metzger < >> rmetz...@apache.org> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> I could not find the logfiles attached to your mails. I think >> the >>>>>>>>>>>> mailinglists are not accepting attachments. >>>>>>>>>>>> Can you put the logs on gist.github.com? >>>>>>>>>>>> >>>>>>>>>>>> The configuration values are documented here: >>>>>>>>>>>> http://flink.apache.org/docs/0.8/config.html >>>>>>>>>>>> For the webclient's port its called webclient.port >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga < >>>>>>> vidura...@icloud.com >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I tried to kill the job manager manually in the terminal and >>>> start >>>>>>> it >>>>>>>>>>>>> again but no luck. Also could you tell me if it’s possible to >>>>>>> change >>>>>>>>>>>>> webclient’s port (8080) ? >>>>>>>>>>>>> >>>>>>>>>>>>>> On Feb 24, 2015, at 1:41 PM, Stephan Ewen <se...@apache.org> >>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hey Dulaj! >>>>>>>>>>>>>> >>>>>>>>>>>>>> As a contributor, I would go against the latest version, which >>>> is >>>>>>>>>>>>>> 0.9-SNAPSHOT. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It may be in your case that the JobManager actor is down, but >>>> the >>>>>>>>>>> process >>>>>>>>>>>>>> still lingers. (BTW: I have a patch pending that makes sure >> the >>>>>>>>>>> process >>>>>>>>>>>>>> disappears when the actor via down). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Could you have a look at the log >>>>>>>>>>> "flink-<user>-jobmanager-<host>-.log" >>>>>>>>>>>>> and >>>>>>>>>>>>>> see if there are any errors logged? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Greetings, >>>>>>>>>>>>>> Stephan >>>>>>>>>>>>>> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" < >>>>>>> vidura...@icloud.com >>>>>>>>>>>> : >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The JobManager seems to run fine. I don't know. When I tried >> to >>>>>>> run >>>>>>>>>>>>>>> start-local.sh again, It shows the PID of the running >>>> JobManager >>>>>>> and >>>>>>>>>>>>> also >>>>>>>>>>>>>>> :8081 runs fine. I want to contribute to the project and I >>>> could >>>>>>>>>>> get a >>>>>>>>>>>>>>> little boost if I could see the capabilities of FLINK. :) >>>>>>>>>>>>>>> Will it be OK to use 0.8.1 as a developer? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <se...@apache.org >>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Dulaj, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> That error message indicates that the JobManager is not >>>> running. >>>>>>>>>>> Are you >>>>>>>>>>>>>>> sure that the JobManager runs properly? Anything in the >>>>>>> JobManager >>>>>>>>>>> logs? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> BTW: The 0.9 branch is under heavy development / changes. >> That >>>> is >>>>>>>>>>> why it >>>>>>>>>>>>>>> may behave a bit different on different days right now. I >> would >>>>>>>>>>>>> recommend >>>>>>>>>>>>>>> to use the 0.8.1 release for a stable experience. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Greetings, >>>>>>>>>>>>>>> Stephan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger < >>>>>>>>>>> rmetz...@apache.org> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you for the quick reply. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The log you've send is from the webclient. Can you also send >>>> the >>>>>>>>>>> log of >>>>>>>>>>>>> the >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> JobManager? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga < >>>>>>>>>>> vidura...@icloud.com> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yes. It seams it is not a problem with the arguments. I >> tried >>>>>>> two >>>>>>>>>>> days >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> but >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> different error occurs. It seams the web client can’t >> connect >>>> to >>>>>>>>>>> the >>>>>>>>>>>>> job >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> manager although it is running >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Right now, I can’t even get the webclient to run. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ./bin/start-webclient.sh >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> executes fine but I cannot connect to localhost:8080 (even >>>> with >>>>>>>>>>> telnet >>>>>>>>>>>>> or >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> curl) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Here is the log for jobManager >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 23:22:31,933 INFO >>>> org.apache.flink.client.web.WebInterfaceServer >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Setting up web frontend server, using web-root directory >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 'jar: >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>> >>>> >> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs >>>>>>>>>>>>>>> '. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 23:22:31,934 INFO >>>> org.apache.flink.client.web.WebInterfaceServer >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Web frontend server will store temporary files in >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded >>>>>>> jobs >>>>>>>>>>> in >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs', >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> plan-json-dumps in >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 23:22:31,934 INFO >>>> org.apache.flink.client.web.WebInterfaceServer >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Web-frontend will submit jobs to nephele job-manager on >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> localhost, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> port 6123. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Slf4jLogger started >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 23:22:32,625 INFO Remoting >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Starting remoting >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 23:22:32,838 INFO Remoting >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Remoting started; listening on addresses :[akka.tcp:// >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> JobsInfoServletActorSystem@127.0.0.1:51517] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 23:23:48,119 WARN Remoting >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Tried to associate with unreachable remote address >>>>>>> [akka.tcp:// >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> flink@10.218.98.169:6123]. Address is now gated for 5000 >> ms, >>>>>>> all >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> messages >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> to this address will be delivered to dead letters. Reason: >>>>>>>>>>> Operation >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> timed >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> out: /10.218.98.169:6123 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Unexpected exception: Could not find job manager at >>>> specified >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager >> '>tcp:// >>>>>>>>>>>>>>> flink@10.218.98.169:6123/user/jobmanager. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> java.lang.RuntimeException: Could not find job manager at >>>>>>> specified >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager >> '>tcp:// >>>>>>>>>>>>>>> flink@10.218.98.169:6123/user/jobmanager. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>> >>>> >> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>> >>>> >> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> at >>>> org.apache.flink.client.WebFrontend.main(WebFrontend.java:74) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Feb 23, 2015, at 11:46 PM, Robert Metzger < >>>>>>> rmetz...@apache.org >>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> you said in the other email thread that the error only >> occurs >>>>>>> for >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Wordcount, not for Kmeans. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Can you copy me the commands for both examples? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I can not really believe that there is a difference between >>>> the >>>>>>>>>>> two >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> jobs. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Can you also send us the contents of the jobmanager log >> file? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Robert >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga < >>>>>>>>>>>>> vidura...@icloud.com >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I’m getting "Could not build up connection to JobManager.” >>>>>>> When i >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> tried >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> run the wordCount example. Can anyone help? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Dulaj >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>>> >> >>