Hi Yan Fang, I was able to deploy the file to hdfs, I can see them in all my nodes but when I tried running I got this error:
Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:287) at org.apache.samza.job.yarn.ClientHelper.submitApplication(ClientHelper.scala:111) at org.apache.samza.job.yarn.YarnJob.submit(YarnJob.scala:55) at org.apache.samza.job.yarn.YarnJob.submit(YarnJob.scala:48) at org.apache.samza.job.JobRunner.run(JobRunner.scala:62) at org.apache.samza.job.JobRunner$.main(JobRunner.scala:37) at org.apache.samza.job.JobRunner.main(JobRunner.scala) This is my yarn.package.path config: yarn.package.path=hdfs://telles-master-samza:50070/samza-job-package-0.7.0-dist.tar.gz Thanks in advance On Mon, Aug 11, 2014 at 3:00 PM, Yan Fang <[email protected]> wrote: > Hi Telles, > > In terms of "*I tried pushing the tar file to HDFS but I got an error from > hadoop saying that it couldn’t find core-site.xml file*.", I guess you set > the HADOOP_CONF_DIR variable and made it point to ~/.samza/conf. You can do > 1) make the HADOOP_CONF_DIR point to the directory where your conf files > are, such as /etc/hadoop/conf. Or 2) copy the config files to > ~/.samza/conf. Thank you, > > Cheer, > > Fang, Yan > [email protected] > +1 (206) 849-4108 > > > On Mon, Aug 11, 2014 at 7:40 AM, Chris Riccomini < > [email protected]> wrote: > > > Hey Telles, > > > > To get YARN working with the HTTP file system, you need to follow the > > instructions on: > > > > > http://samza.incubator.apache.org/learn/tutorials/0.7.0/run-in-multi-node-y > > arn.html > > > > > > In the "Set Up Http Filesystem for YARN" section. > > > > You shouldn't need to compile anything (no Gradle, which is what your > > stack trace is showing). This setup should be done for all of the NMs, > > since they will be the ones downloading your job's package (from > > yarn.package.path). > > > > Cheers, > > Chris > > > > On 8/9/14 9:44 PM, "Telles Nobrega" <[email protected]> wrote: > > > > >Hi again, I tried installing the scala libs but the Http problem still > > >occurs. I realised that I need to compile incubator samza in the > machines > > >that I¹m going to run the jobs, but the compilation fails with this huge > > >message: > > > > > ># > > ># There is insufficient memory for the Java Runtime Environment to > > >continue. > > ># Native memory allocation (malloc) failed to allocate 3946053632 bytes > > >for committing reserved memory. > > ># An error report file with more information is saved as: > > ># /home/ubuntu/incubator-samza/samza-kafka/hs_err_pid2506.log > > >Could not write standard input into: Gradle Worker 13. > > >java.io.IOException: Broken pipe > > > at java.io.FileOutputStream.writeBytes(Native Method) > > > at java.io.FileOutputStream.write(FileOutputStream.java:345) > > > at > > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > > > at > > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > > > at > > > >org.gradle.process.internal.streams.ExecOutputHandleRunner.run(ExecOutputH > > >andleRunner.java:53) > > > at > > > >org.gradle.internal.concurrent.DefaultExecutorFactory$StoppableExecutorImp > > >l$1.run(DefaultExecutorFactory.java:66) > > > at > > > >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java: > > >1145) > > > at > > > >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java > > >:615) > > > at java.lang.Thread.run(Thread.java:744) > > >Process 'Gradle Worker 13' finished with non-zero exit value 1 > > >org.gradle.process.internal.ExecException: Process 'Gradle Worker 13' > > >finished with non-zero exit value 1 > > > at > > > >org.gradle.process.internal.DefaultExecHandle$ExecResultImpl.assertNormalE > > >xitValue(DefaultExecHandle.java:362) > > > at > > > >org.gradle.process.internal.DefaultWorkerProcess.onProcessStop(DefaultWork > > >erProcess.java:89) > > > at > > > >org.gradle.process.internal.DefaultWorkerProcess.access$000(DefaultWorkerP > > >rocess.java:33) > > > at > > > >org.gradle.process.internal.DefaultWorkerProcess$1.executionFinished(Defau > > >ltWorkerProcess.java:55) > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > at > > > >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: > > >57) > > > at > > > >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm > > >pl.java:43) > > > at java.lang.reflect.Method.invoke(Method.java:606) > > > at > > > >org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispat > > >ch.java:35) > > > at > > > >org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispat > > >ch.java:24) > > > at > > > >org.gradle.listener.BroadcastDispatch.dispatch(BroadcastDispatch.java:81) > > > at > > > >org.gradle.listener.BroadcastDispatch.dispatch(BroadcastDispatch.java:30) > > > at > > > >org.gradle.messaging.dispatch.ProxyDispatchAdapter$DispatchingInvocationHa > > >ndler.invoke(ProxyDispatchAdapter.java:93) > > > at com.sun.proxy.$Proxy46.executionFinished(Unknown Source) > > > at > > > >org.gradle.process.internal.DefaultExecHandle.setEndStateInfo(DefaultExecH > > >andle.java:212) > > > at > > > >org.gradle.process.internal.DefaultExecHandle.finished(DefaultExecHandle.j > > >ava:309) > > > at > > > >org.gradle.process.internal.ExecHandleRunner.completed(ExecHandleRunner.ja > > >va:108) > > > at > > > >org.gradle.process.internal.ExecHandleRunner.run(ExecHandleRunner.java:88) > > > at > > > >org.gradle.internal.concurrent.DefaultExecutorFactory$StoppableExecutorImp > > >l$1.run(DefaultExecutorFactory.java:66) > > > at > > > >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java: > > >1145) > > > at > > > >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java > > >:615) > > > at java.lang.Thread.run(Thread.java:744) > > >OpenJDK 64-Bit Server VM warning: INFO: > > >os::commit_memory(0x000000070a6c0000, 3946053632, 0) failed; > > >error='Cannot allocate memory' (errno=12) > > ># > > ># There is insufficient memory for the Java Runtime Environment to > > >continue. > > ># Native memory allocation (malloc) failed to allocate 3946053632 bytes > > >for committing reserved memory. > > ># An error report file with more information is saved as: > > ># /home/ubuntu/incubator-samza/samza-kafka/hs_err_pid2518.log > > >Could not write standard input into: Gradle Worker 14. > > >java.io.IOException: Broken pipe > > > at java.io.FileOutputStream.writeBytes(Native Method) > > > at java.io.FileOutputStream.write(FileOutputStream.java:345) > > > at > > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > > > at > > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > > > at > > > >org.gradle.process.internal.streams.ExecOutputHandleRunner.run(ExecOutputH > > >andleRunner.java:53) > > > at > > > >org.gradle.internal.concurrent.DefaultExecutorFactory$StoppableExecutorImp > > >l$1.run(DefaultExecutorFactory.java:66) > > > at > > > >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java: > > >1145) > > > at > > > >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java > > >:615) > > > at java.lang.Thread.run(Thread.java:744) > > >Process 'Gradle Worker 14' finished with non-zero exit value 1 > > >org.gradle.process.internal.ExecException: Process 'Gradle Worker 14' > > >finished with non-zero exit value 1 > > > at > > > >org.gradle.process.internal.DefaultExecHandle$ExecResultImpl.assertNormalE > > >xitValue(DefaultExecHandle.java:362) > > > at > > > >org.gradle.process.internal.DefaultWorkerProcess.onProcessStop(DefaultWork > > >erProcess.java:89) > > > at > > > >org.gradle.process.internal.DefaultWorkerProcess.access$000(DefaultWorkerP > > >rocess.java:33) > > > at > > > >org.gradle.process.internal.DefaultWorkerProcess$1.executionFinished(Defau > > >ltWorkerProcess.java:55) > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > at > > > >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: > > >57) > > > at > > > >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm > > >pl.java:43) > > > at java.lang.reflect.Method.invoke(Method.java:606) > > > at > > > >org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispat > > >ch.java:35) > > > at > > > >org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispat > > >ch.java:24) > > > at > > > >org.gradle.listener.BroadcastDispatch.dispatch(BroadcastDispatch.java:81) > > > at > > > >org.gradle.listener.BroadcastDispatch.dispatch(BroadcastDispatch.java:30) > > > at > > > >org.gradle.messaging.dispatch.ProxyDispatchAdapter$DispatchingInvocationHa > > >ndler.invoke(ProxyDispatchAdapter.java:93) > > > at com.sun.proxy.$Proxy46.executionFinished(Unknown Source) > > > at > > > >org.gradle.process.internal.DefaultExecHandle.setEndStateInfo(DefaultExecH > > >andle.java:212) > > > at > > > >org.gradle.process.internal.DefaultExecHandle.finished(DefaultExecHandle.j > > >ava:309) > > > at > > > >org.gradle.process.internal.ExecHandleRunner.completed(ExecHandleRunner.ja > > >va:108) > > > at > > > >org.gradle.process.internal.ExecHandleRunner.run(ExecHandleRunner.java:88) > > > at > > > >org.gradle.internal.concurrent.DefaultExecutorFactory$StoppableExecutorImp > > >l$1.run(DefaultExecutorFactory.java:66) > > > at > > > >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java: > > >1145) > > > at > > > >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java > > >:615) > > > at java.lang.Thread.r > > > > > >Do I need more memory for my machines? Each already has 4GB. I really > > >need to have this running. I¹m not sure which way is best http or hdfs > > >which one you suggest and how can i solve my problem for each case. > > > > > >Thanks in advance and sorry for bothering this much. > > >On 10 Aug 2014, at 00:20, Telles Nobrega <[email protected]> > wrote: > > > > > >> Hi Chris, now I have the tar file in my RM machine, and the yarn path > > >>points to it. I changed the core-site.xml to use HttpFileSystem instead > > >>of HDFS now it is failing with > > >> > > >> Application application_1407640485281_0001 failed 2 times due to AM > > >>Container for appattempt_1407640485281_0001_000002 exited with > > >>exitCode:-1000 due to: java.lang.ClassNotFoundException: Class > > >>org.apache.samza.util.hadoop.HttpFileSystem not found > > >> > > >> I think I can solve this just installing scala files from the samza > > >>tutorial, can you confirm that? > > >> > > >> On 09 Aug 2014, at 08:34, Telles Nobrega <[email protected]> > > >>wrote: > > >> > > >>> Hi Chris, > > >>> > > >>> I think the problem is that I forgot to update the yarn.job.package. > > >>> I will try again to see if it works now. > > >>> > > >>> I have one more question, how can I stop (command line) the jobs > > >>>running in my topology, for the experiment that I will run, I need to > > >>>run the same job in 4 minutes intervals. So I need to kill it, clean > > >>>the kafka topics and rerun. > > >>> > > >>> Thanks in advance. > > >>> > > >>> On 08 Aug 2014, at 12:41, Chris Riccomini > > >>><[email protected]> wrote: > > >>> > > >>>> Hey Telles, > > >>>> > > >>>>>> Do I need to have the job folder on each machine in my cluster? > > >>>> > > >>>> No, you should not need to do this. There are two ways to deploy > your > > >>>> tarball to the YARN grid. One is to put it in HDFS, and the other is > > >>>>to > > >>>> put it on an HTTP server. The link to running a Samza job in a > > >>>>multi-node > > >>>> YARN cluster describes how to do both (either HTTP server or HDFS). > > >>>> > > >>>> In both cases, once the tarball is put in on the HTTP/HDFS > server(s), > > >>>>you > > >>>> must update yarn.package.path to point to it. From there, the YARN > NM > > >>>> should download it for you automatically when you start your job. > > >>>> > > >>>> * Can you send along a paste of your job config? > > >>>> > > >>>> Cheers, > > >>>> Chris > > >>>> > > >>>> On 8/8/14 8:04 AM, "Claudio Martins" <[email protected]> > wrote: > > >>>> > > >>>>> Hi Telles, it looks to me that you forgot to update the > > >>>>> "yarn.package.path" > > >>>>> attribute in your config file for the task. > > >>>>> > > >>>>> - Claudio Martins > > >>>>> Head of Engineering > > >>>>> MobileAware USA Inc. / www.mobileaware.com > > >>>>> office: +1 617 986 5060 / mobile: +1 617 480 5288 > > >>>>> linkedin: www.linkedin.com/in/martinsclaudio > > >>>>> > > >>>>> > > >>>>> On Fri, Aug 8, 2014 at 10:55 AM, Telles Nobrega > > >>>>><[email protected]> > > >>>>> wrote: > > >>>>> > > >>>>>> Hi, > > >>>>>> > > >>>>>> this is my first time trying to run a job on a multinode > > >>>>>>environment. I > > >>>>>> have the cluster set up, I can see in the GUI that all nodes are > > >>>>>> working. > > >>>>>> Do I need to have the job folder on each machine in my cluster? > > >>>>>> - The first time I tried running with the job on the namenode > > >>>>>>machine > > >>>>>> and > > >>>>>> it failed saying: > > >>>>>> > > >>>>>> Application application_1407509228798_0001 failed 2 times due to > AM > > >>>>>> Container for appattempt_1407509228798_0001_000002 exited with > > >>>>>>exitCode: > > >>>>>> -1000 due to: File > > >>>>>> > > >>>>>> > > >>>>>> > > > >>>>>>file:/home/ubuntu/alarm-samza/samza-job-package/target/samza-job-pack > > >>>>>>age- > > >>>>>> 0.7.0-dist.tar.gz > > >>>>>> does not exist > > >>>>>> > > >>>>>> So I copied the folder to each machine in my cluster and got this > > >>>>>>error: > > >>>>>> > > >>>>>> Application application_1407509228798_0002 failed 2 times due to > AM > > >>>>>> Container for appattempt_1407509228798_0002_000002 exited with > > >>>>>>exitCode: > > >>>>>> -1000 due to: Resource > > >>>>>> > > >>>>>> > > >>>>>> > > > >>>>>>file:/home/ubuntu/alarm-samza/samza-job-package/target/samza-job-pack > > >>>>>>age- > > >>>>>> 0.7.0-dist.tar.gz > > >>>>>> changed on src filesystem (expected 1407509168000, was > 1407509434000 > > >>>>>> > > >>>>>> What am I missing? > > >>>>>> > > >>>>>> p.s.: I followed this > > >>>>>> > > >>>>>>< > > https://github.com/yahoo/samoa/wiki/Executing-SAMOA-with-Apache-Samz > > >>>>>>a> > > >>>>>> tutorial > > >>>>>> and this > > >>>>>> < > > >>>>>> > > >>>>>> > > >>>>>> > > http://samza.incubator.apache.org/learn/tutorials/0.7.0/run-in-multi- > > >>>>>>node > > >>>>>> -yarn.html > > >>>>>>> > > >>>>>> to > > >>>>>> set up the cluster. > > >>>>>> > > >>>>>> Help is much appreciated. > > >>>>>> > > >>>>>> Thanks in advance. > > >>>>>> > > >>>>>> -- > > >>>>>> ------------------------------------------ > > >>>>>> Telles Mota Vidal Nobrega > > >>>>>> M.sc. Candidate at UFCG > > >>>>>> B.sc. in Computer Science at UFCG > > >>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG > > >>>>>> > > >>>> > > >>> > > >> > > > > > > > > -- ------------------------------------------ Telles Mota Vidal Nobrega M.sc. Candidate at UFCG B.sc. in Computer Science at UFCG Software Engineer at OpenStack Project - HP/LSD-UFCG
