Thanks for your help Chris. Got it to work now. I will test my case and documentation further. I can edit the Samza documentation to reflect any changes.
- Shekar On Thu, Mar 12, 2015 at 5:19 PM, Chris Riccomini <criccom...@apache.org> wrote: > Hey Shekar, > > Yes, this is definitely a classpath issue. The pastebin you sent does not > include any of the samza-core/samza-yarn/scala JARs. This is rather > strange, since you said you put the JARs in this path: > > /home/hadoop/hadoop-2.5.2/share/hadoop/hdfs/lib/ > > And I do see *other* JARs listed with this path. Are you sure you put the > Samza JARs on *all* machines, not just the RM machine? According to the > yarn-default.xml logs, it says: > > CLASSPATH for YARN applications. A comma-separated list of CLASSPATH > entries. When this value is empty, the following default CLASSPATH for YARN > applications would be used. For Linux: $HADOOP_CONF_DIR, > $HADOOP_COMMON_HOME/share/hadoop/common/*, > $HADOOP_COMMON_HOME/share/hadoop/common/lib/*, > $HADOOP_HDFS_HOME/share/hadoop/hdfs/*, > $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*, > $HADOOP_YARN_HOME/share/hadoop/yarn/*, > $HADOOP_YARN_HOME/share/hadoop/yarn/lib/* > > So, it seems like it should pick up the JARs, if they're in the NM's > directory. > > The exception that you're now seeing seems to suggest that one of the Samza > containers is failing: > > Container for appattempt_1426204312971_0001_000002 exited with exitCode: 1 > > The _000002 suffix indicates a non-AM failure (i.e. the Samza container > failed, not the Samza AM). Can you check the AM logs, and find the http:// > ... > link to the container logs? It should give a hint about why the container > failed. > > Cheers, > Chris > > On Thu, Mar 12, 2015 at 4:58 PM, Shekar Tippur <ctip...@gmail.com> wrote: > > > Chris, > > > > Made some progress. > > > > By adding yarn.application.classpath to yarn-site.xml, I am no longer > > getting class not found error. However, I am getting a different error: > > > > Application application_1426204312971_0001 failed 2 times due to AM > > Container for appattempt_1426204312971_0001_000002 exited with exitCode: > 1 > > due to: Exception from container-launch: ExitCodeException exitCode=1: > > ExitCodeException exitCode=1: > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) > > at org.apache.hadoop.util.Shell.run(Shell.java:455) > > at > > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) > > at > > > > > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > > at > > > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300) > > at > > > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > > at > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > Container exited with a non-zero exit code 1 > > .Failing this attempt.. Failing the application. > > > > Looks like a common issue with yarn but not sure how to resolve as yet. > > > > > > - Shekar > > > > On Thu, Mar 12, 2015 at 1:44 PM, Shekar Tippur <ctip...@gmail.com> > wrote: > > > > > Chris - Here it is. > > > > > > http://pastebin.com/c3e21Hzf > > > > > > - Shekar > > > > > > On Thu, Mar 12, 2015 at 10:58 AM, Chris Riccomini < > criccom...@apache.org > > > > > > wrote: > > > > > >> This is the line that I'm interested in: > > >> > > >> STARTUP_MSG: classpath .... > > >> > > >> On Thu, Mar 12, 2015 at 10:55 AM, Chris Riccomini < > > criccom...@apache.org> > > >> wrote: > > >> > > >> > Hey Shekar, > > >> > > > >> > Could you paste the full log on pastebin? It really seems like > > >> something's > > >> > missing from the classpath. If samza-yarn is there, it should be > able > > to > > >> > see that file. I think the full log has a dump of the classpath. If > it > > >> > doesn't, could you paste the line where the YARN NM is starting up, > > and > > >> > dumps the full classpath? > > >> > > > >> > Cheers, > > >> > Chris > > >> > > > >> > On Thu, Mar 12, 2015 at 10:17 AM, Shekar Tippur <ctip...@gmail.com> > > >> wrote: > > >> > > > >> >> I think all these jars are in place (Under > > >> >> $HADOOP_YARN_HOME/share/hadoop/hdfs/lib) > > >> >> > > >> >> - Shekar > > >> >> > > >> >> On Thu, Mar 12, 2015 at 9:36 AM, Chris Riccomini < > > >> criccom...@apache.org> > > >> >> wrote: > > >> >> > > >> >> > Hey Shekar, > > >> >> > > > >> >> > You need that samza-yarn file on your RM/NM's classpath, along > with > > >> >> scala. > > >> >> > We missed this in the docs, and are tracking the issue here: > > >> >> > > > >> >> > https://issues.apache.org/jira/browse/SAMZA-456 > > >> >> > > > >> >> > You'll also need samza-core in the classpath, based on the > > >> discussion on > > >> >> > SAMZA-456. Sorry about that. If you want to update the tutorial > > when > > >> you > > >> >> > get your cluster working, and submit a patch, that'd be great! :) > > >> >> > > > >> >> > Cheers, > > >> >> > Chris > > >> >> > > > >> >> > On Wed, Mar 11, 2015 at 9:43 PM, Shekar Tippur < > ctip...@gmail.com> > > >> >> wrote: > > >> >> > > > >> >> > > Here is the corresponding log: > > >> >> > > > > >> >> > > 2015-03-11 20:43:09,665 INFO [AsyncDispatcher event handler] > > >> >> > > localizer.LocalizedResource > (LocalizedResource.java:handle(203)) > > - > > >> >> > Resource > > >> >> > > http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz > > >> transitioned > > >> >> from > > >> >> > > INIT to DOWNLOADING > > >> >> > > > > >> >> > > 2015-03-11 20:43:09,665 INFO [AsyncDispatcher event handler] > > >> >> > > localizer.ResourceLocalizationService > > >> >> > > (ResourceLocalizationService.java:handle(679)) - Created > > localizer > > >> for > > >> >> > > container_1426121400423_2587_01_000001 > > >> >> > > > > >> >> > > 2015-03-11 20:43:09,669 INFO [LocalizerRunner for > > >> >> > > container_1426121400423_2587_01_000001] > > >> >> > > localizer.ResourceLocalizationService > > >> >> > > (ResourceLocalizationService.java:writeCredentials(1107)) - > > Writing > > >> >> > > credentials to the nmPrivate file > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > /tmp/hadoop-hadoop/nm-local-dir/nmPrivate/container_1426121400423_2587_01_000001.tokens. > > >> >> > > Credentials list: > > >> >> > > > > >> >> > > 2015-03-11 20:43:09,675 INFO [DeletionService #0] > > >> >> > > nodemanager.DefaultContainerExecutor > > >> >> > > (DefaultContainerExecutor.java:deleteAsUser(378)) - Deleting > > path : > > >> >> > > > > >> /home/hadoop/hadoop-2.5.2/logs/userlogs/application_1426120927668_0010 > > >> >> > > > > >> >> > > 2015-03-11 20:43:09,676 INFO [LocalizerRunner for > > >> >> > > container_1426121400423_2587_01_000001] > > >> >> > > nodemanager.DefaultContainerExecutor > > >> >> > > (DefaultContainerExecutor.java:createUserCacheDirs(469)) - > > >> >> Initializing > > >> >> > > user root > > >> >> > > > > >> >> > > 2015-03-11 20:43:09,685 INFO [LocalizerRunner for > > >> >> > > container_1426121400423_2587_01_000001] > > >> >> > > nodemanager.DefaultContainerExecutor > > >> >> > > (DefaultContainerExecutor.java:startLocalizer(103)) - Copying > > from > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > /tmp/hadoop-hadoop/nm-local-dir/nmPrivate/container_1426121400423_2587_01_000001.tokens > > >> >> > > to > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > /tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587/container_1426121400423_2587_01_000001.tokens > > >> >> > > > > >> >> > > *2015-03-11 20:43:09,685 INFO [LocalizerRunner for > > >> >> > > container_1426121400423_2587_01_000001] > > >> >> > > nodemanager.DefaultContainerExecutor > > >> >> > > (DefaultContainerExecutor.java:startLocalizer(105)) - CWD set > to > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > /tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587 > > >> >> > > = > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > file:/tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587* > > >> >> > > > > >> >> > > *2015-03-11 20:43:09,716 INFO [IPC Server handler 2 on 8040] > > >> >> > > localizer.ResourceLocalizationService > > >> >> > > (ResourceLocalizationService.java:update(1007)) - DEBUG: > FAILED { > > >> >> > > http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz > > >> >> > > <http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz>, 0, > > >> ARCHIVE, > > >> >> > null > > >> >> > > }, java.lang.ClassNotFoundException: Class > > >> >> > > org.apache.samza.util.hadoop.HttpFileSystem not found* > > >> >> > > > > >> >> > > *2015-03-11 20:43:09,716 INFO [IPC Server handler 2 on 8040] > > >> >> > > localizer.LocalizedResource > (LocalizedResource.java:handle(203)) > > - > > >> >> > Resource > > >> >> > > http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz(- > > >> >> > > <http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz(- > > >> >> > > > > >> >> > > > >> >> > > >> > > > >>/tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587/filecache/10/hello-samza-0.8.0-dist.tar.gz) > > >> >> > > transitioned from DOWNLOADING to FAILED* > > >> >> > > > > >> >> > > 2015-03-11 20:43:09,717 INFO [AsyncDispatcher event handler] > > >> >> > > container.Container (ContainerImpl.java:handle(918)) - > Container > > >> >> > > container_1426121400423_2587_01_000001 transitioned from > > >> LOCALIZING to > > >> >> > > LOCALIZATION_FAILED > > >> >> > > > > >> >> > > 2015-03-11 20:43:09,717 INFO [AsyncDispatcher event handler] > > >> >> > > localizer.LocalResourcesTrackerImpl > > >> >> > > (LocalResourcesTrackerImpl.java:handle(151)) - Container > > >> >> > > container_1426121400423_2587_01_000001 sent RELEASE event on a > > >> >> resource > > >> >> > > request { > http://sprfargas102:8000/hello-samza-0.8.0-dist.tar.gz > > , > > >> 0, > > >> >> > > ARCHIVE, null } not present in cache. > > >> >> > > > > >> >> > > 2015-03-11 20:43:09,717 WARN [AsyncDispatcher event handler] > > >> >> > > nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) > - > > >> >> > > USER=root OPERATION=Container > > >> >> > > Finished - Failed TARGET=ContainerImpl RESULT=FAILURE > > >> >> > DESCRIPTION=Container > > >> >> > > failed with state: LOCALIZATION_FAILED > > >> >> > APPID=application_1426121400423_2587 > > >> >> > > CONTAINERID=container_1426121400423_2587_01_000001 > > >> >> > > > > >> >> > > 2015-03-11 20:43:09,717 INFO [AsyncDispatcher event handler] > > >> >> > > container.Container (ContainerImpl.java:handle(918)) - > Container > > >> >> > > container_1426121400423_2587_01_000001 transitioned from > > >> >> > > LOCALIZATION_FAILED to DONE > > >> >> > > > > >> >> > > 2015-03-11 20:43:09,717 INFO [AsyncDispatcher event handler] > > >> >> > > application.Application (ApplicationImpl.java:transition(340)) > - > > >> >> Removing > > >> >> > > container_1426121400423_2587_01_000001 from application > > >> >> > > application_1426121400423_2587 > > >> >> > > > > >> >> > > 2015-03-11 20:43:09,717 INFO [AsyncDispatcher event handler] > > >> >> > > containermanager.AuxServices (AuxServices.java:handle(196)) - > Got > > >> >> event > > >> >> > > CONTAINER_STOP for appId application_1426121400423_2587 > > >> >> > > > > >> >> > > 2015-03-11 20:43:09,717 INFO [DeletionService #2] > > >> >> > > nodemanager.DefaultContainerExecutor > > >> >> > > (DefaultContainerExecutor.java:deleteAsUser(369)) - Deleting > > >> absolute > > >> >> > path > > >> >> > > : > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > /tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587/container_1426121400423_2587_01_000001 > > >> >> > > > > >> >> > > 2015-03-11 20:43:09,717 WARN [DeletionService #2] > > >> >> > > nodemanager.DefaultContainerExecutor > > >> >> > > (DefaultContainerExecutor.java:deleteAsUser(372)) - delete > > returned > > >> >> false > > >> >> > > for path: > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > [/tmp/hadoop-hadoop/nm-local-dir/usercache/root/appcache/application_1426121400423_2587/container_1426121400423_2587_01_000001] > > >> >> > > > > >> >> > > 2015-03-11 20:43:09,718 WARN [LocalizerRunner for > > >> >> > > container_1426121400423_2587_01_000001] ipc.Client > > >> >> > (Client.java:call(1389)) > > >> >> > > - interrupted waiting to send rpc request to server > > >> >> > > > > >> >> > > java.lang.InterruptedException > > >> >> > > > > >> >> > > at > java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400) > > >> >> > > > > >> >> > > at java.util.concurrent.FutureTask.get(FutureTask.java:187) > > >> >> > > > > >> >> > > at > > >> >> > > > >> > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1030) > > >> >> > > > > >> >> > > at org.apache.hadoop.ipc.Client.call(Client.java:1384) > > >> >> > > > > >> >> > > at org.apache.hadoop.ipc.Client.call(Client.java:1364) > > >> >> > > > > >> >> > > at > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > > >> >> > > > > >> >> > > at com.sun.proxy.$Proxy29.heartbeat(Unknown Source) > > >> >> > > > > >> >> > > at > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62) > > >> >> > > > > >> >> > > at > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:255) > > >> >> > > > > >> >> > > at > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) > > >> >> > > > > >> >> > > at > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:107) > > >> >> > > > > >> >> > > at > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1073) > > >> >> > > > > >> >> > > java.io.IOException: java.lang.InterruptedException > > >> >> > > > > >> >> > > at org.apache.hadoop.ipc.Client.call(Client.java:1390) > > >> >> > > > > >> >> > > at org.apache.hadoop.ipc.Client.call(Client.java:1364) > > >> >> > > > > >> >> > > at > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > > >> >> > > > > >> >> > > at com.sun.proxy.$Proxy29.heartbeat(Unknown Source) > > >> >> > > > > >> >> > > at > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62) > > >> >> > > > > >> >> > > at > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:255) > > >> >> > > > > >> >> > > at > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) > > >> >> > > > > >> >> > > at > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:107) > > >> >> > > > > >> >> > > at > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1073) > > >> >> > > > > >> >> > > Caused by: java.lang.InterruptedException > > >> >> > > > > >> >> > > at > java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400) > > >> >> > > > > >> >> > > at java.util.concurrent.FutureTask.get(FutureTask.java:187) > > >> >> > > > > >> >> > > at > > >> >> > > > >> > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1030) > > >> >> > > > > >> >> > > at org.apache.hadoop.ipc.Client.call(Client.java:1384) > > >> >> > > > > >> >> > > ... 8 more > > >> >> > > > > >> >> > > On Wed, Mar 11, 2015 at 4:56 PM, Shekar Tippur < > > ctip...@gmail.com> > > >> >> > wrote: > > >> >> > > > > >> >> > > > Hello, > > >> >> > > > > > >> >> > > > Sorry to reopen this topic. I had setup yarn couple of months > > ago > > >> >> and > > >> >> > > cant > > >> >> > > > seem to replicate this now. > > >> >> > > > > > >> >> > > > I see that I have done everything listed here > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > http://samza.apache.org/learn/tutorials/0.7.0/run-in-multi-node-yarn.html > > >> >> > > > > > >> >> > > > I see this error on the application side > > >> >> > > > > > >> >> > > > Application application_1426115467623_0492 failed 2 times due > > to > > >> AM > > >> >> > > > Container for appattempt_1426115467623_0492_000002 exited > with > > >> >> > exitCode: > > >> >> > > > -1000 due to: java.lang.ClassNotFoundException: Class > > >> >> > > > org.apache.samza.util.hadoop.HttpFileSystem not found > > >> >> > > > .Failing this attempt.. Failing the application. > > >> >> > > > > > >> >> > > > I see that > > >> >> > > > > > >> >> > > > > > >> >> > > > >> >> > > >> > > /home/hadoop/hadoop-2.5.2/share/hadoop/hdfs/lib/samza-yarn_2.10-0.8.0.jar > > >> >> > > > has that particular class > > >> >> > > > > > >> >> > > > 1739 Tue Nov 25 10:51:40 PST 2014 > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > org/apache/samza/util/hadoop/HttpFileSystem$$anonfun$getFileStatus$1.class > > >> >> > > > > > >> >> > > > 1570 Tue Nov 25 10:51:40 PST 2014 > > >> >> > > > > > >> >> > > org/apache/samza/util/hadoop/HttpFileSystem$$anonfun$initialize$1.class > > >> >> > > > > > >> >> > > > 1597 Tue Nov 25 10:51:40 PST 2014 > > >> >> > > > > > org/apache/samza/util/hadoop/HttpFileSystem$$anonfun$open$1.class > > >> >> > > > > > >> >> > > > 1797 Tue Nov 25 10:51:40 PST 2014 > > >> >> > > > > > org/apache/samza/util/hadoop/HttpFileSystem$$anonfun$open$2.class > > >> >> > > > > > >> >> > > > 9549 Tue Nov 25 10:51:40 PST 2014 > > >> >> > > > org/apache/samza/util/hadoop/HttpFileSystem.class > > >> >> > > > > > >> >> > > > > > >> >> > > > I see that env is set right: > > >> >> > > > > > >> >> > > > > > >> >> > > > HADOOP_YARN_HOME=/home/hadoop/hadoop-2.5.2 > > >> >> > > > > > >> >> > > > HADOOP_CONF_DIR=/home/hadoop/hadoop-2.5.2/conf > > >> >> > > > > > >> >> > > > Wondering if I am missing anything... > > >> >> > > > - Shekar > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > >> > > > >> > > > > > > > > >