Hey TJ, The yarn-site.xml file is found via the YARN_HOME environment variable. This variable must be set (export YARN_HOME=Š) when you start your NM. >From there on out, everything gets access to it. When the AM creates a YarnConfiguration, the object will load its values from the yarn-site.xml (and use the YARN_HOME environment variable to find its location).
You should also verify that your yarn-site.xml for the NMs is appropriately configured to point at the RM's host/port. Also, when you go to the RM's webpage, do you see all of your Active Nodes listed? (http://your-rm-host:port/cluster/nodes) Cheers, Chris On 2/20/14 1:06 AM, "TJ Giuli" <[email protected]> wrote: >Hi, to follow up on this thread of discussion, I¹ve got a three-node >Cloudera CDH5 YARN cluster running and I¹m having some problems deploying >Samza jobs on the grid. All of the nodes are running a NodeManager and >just one is running a ResourceManager. If the ApplicationMaster is >deployed to the node with the RM, everything is fine. However, if the >job is deployed to one of the other two hosts, the job fails. Looking at >the AM log (http://pastebin.com/VxbLiWST), the AM is trying to contact >the cluster ResourceManager at 0.0.0.0:8030, which is a YARN default. >Nothing is at 0.0.0.0, so the job eventually dies. > >It looks like yarn-site.xml is not being read by any component of the >system and so it¹s failing back to the default value for the >ResourceManager¹s address. Looking at the code, it seems that >org.apache.samza.job.yarn.SamzaAppMaster creates a new YarnConfiguration >object and passes it to ClientHelper. Is yarn-site.xml being read in >somewhere? Am I missing some key configuration? Thanks! >‹T > >On Feb 5, 2014, at 5:59 PM, Chris Riccomini <[email protected]> >wrote: > >> Hey Sonali, >> >> The next step you need to take is to build your Samza job package (the >> .tgz file that contains bin and lib directories). Take a look at >> hello-samza, which shows how to build a .tar.gz file with the >>appropriate >> files in it. >> >> Once you have the .tar.gz file built, you need to publish it somewhere. >> This can be HDFS or an HTTP server. >> >> == IF YOU USE HDFS, SKIP THIS STEP == >> >> At LinkedIn, we use an HTTP server. The easiest way to hack this up for >> testing is to start a local HTTP server on your developer box with >>Python: >> >> python -m SimpleHTTPServer >> >> This command will start a simple HTTP server serving files from the >> current working directory. So, running that command from the directory >> with your .tar.gz job package should work. >> >> You then need to setup your NMs to be able to read HTTP files, since >> Hadoop doesn't support an HTTP-based file system implementation out of >>the >> box. Fortunately, Samza ships with one. To use it, you need to do two >> things: >> >> First, add this to your NM's core-site.xml: >> >> <configuration> >> <property> >> <name>fs.http.impl</name> >> <value>org.apache.samza.util.hadoop.HttpFileSystem</value> >> </property> >> </configuration> >> >> Second, make sure that you put the following jars into your NM's class >> path: >> >> >> * grizzled-slf4j >> * samza-yarn >> * scala-compiler >> * scala-library >> >> Make sure that all of these libraries match the same version of Scala >>that >> samza-yarn was built with. >> >> The easiest way to add everything to your NM's class path is to put the >> files in the lib directory: >> >> hadoop-2.2.0/share/hadoop/hdfs/lib >> >> == END OF "IF YOU USE HDFS, SKIP THIS STEP" SECTION == >> >> >> Now, you should have a .tar.gz file with a URI that's either: >> >> hdfs://foo/bar/your-job-package.tar.gz >> >> Or: >> >> http://192.168.0.1/your-job-package.tar.gz >> >> This path (either the HDFS or HTTP one, depending on which you chose to >> use) is what you should set your yarn.package.path configuration >>parameter >> to in your job's configuration file. >> >> yarn.package.path=http://192.168.0.1/your-job-package.tar.gz >> >> This tells YARN's NMs where to download your job package from when YARN >> begins running it in the grid. >> >> Finally, you'll want to start your job! >> >> 1. Make sure that you're using the YarnJobRunner for your >> job.factory.class configuration setting (see hello-samza for an >>example). >> 2. Get a copy of one of your NM's yarn-site.xml and put it somewhere on >> your desktop (I usually use ~/.yarn/conf/yarn-site.xml). Note that >>there's >> a "conf" directory there. This is mandatory. >> 3. Setup an environment variable called YARN_HOME that points to the >> directory that has "conf" directory in it: >> >> export YARN_HOME=~/.yarn >> >> 4. Execute your job with run-job.sh (see >> http://samza.incubator.apache.org/startup/hello-samza/0.7.0/ for an >> example). >> >> This should start the job on your YARN grid. >> >> Cheers, >> Chris >> >> On 2/5/14 5:40 PM, "[email protected]" >> <[email protected]> wrote: >> >>> Hi Chris, >>> >>> So this is what I have now: >>> 1. YARN-Cluster with 1 RM and 2NMs >>> 2. Kafka broker running on each NM >>> 3. Zookeeper running on the RM >>> 4. I downloaded and published(gradlew) the incubator-samza project. >>>It's >>> in my /root/m2 repository ready to be used by my project(when I create >>> one) >>> >>> Where do I go from here? How do I get Samza to point to this setup >>> exactly? >>> >>> Thanks, >>> Sonali >>> >>> -----Original Message----- >>> From: Chris Riccomini [mailto:[email protected]] >>> Sent: Monday, February 03, 2014 12:10 PM >>> To: [email protected] >>> Subject: Re: Cluster Installation >>> >>> Hey Sonali, >>> >>> You will need to setup separately in order to configure your >>> yarn-site.xml files for the NMs to point to the RM's host/port. They >>> default to localhost, which is what hello-samza is using. >>> >>> On the Kafka side, the same things applies- you'll need to configure >>>each >>> broker with a unique broker id, etc. >>> >>> Cheers, >>> Chris >>> >>> On 2/3/14 11:25 AM, "[email protected]" >>> <[email protected]> wrote: >>> >>>> Ah, makes sense >>>> >>>> So to have a cluster setup with RM and NMs running on different nodes, >>>> Can I reuse the "grid" script from "hello-samza"? or will I have to do >>>> the setup separately and then change the config files on samza? >>>> >>>> Thanks, >>>> Sonali >>>> >>>> -----Original Message----- >>>> From: Chris Riccomini [mailto:[email protected]] >>>> Sent: Monday, February 03, 2014 11:02 AM >>>> To: [email protected] >>>> Subject: Re: Cluster Installation >>>> >>>> Hey Sonali, >>>> >>>> I believe the point at which YARN became version compatible for 2.* as >>>> at 2.1.0-beta. I believe 2.0.5 is not API compatible with later >>>> versions of YARN (e.g. 2.2). For this reason, you'll need to upgrade >>>> your YARN grid, or use a different one with a higher version. >>>> >>>> For its part, Samza should work with YARN grids 2.1.0-beta and beyond, >>>> though I haven't tested this. The YARN community has given a >>>>commitment >>>> to maintaining API compatibility going forward for YARN 2.*, which >>>> means that future upgrades should not be required, until YARN 3 comes >>>> out. >>>> >>>> The rest of your understanding is correct. You can run a 1 RM, 2 NM >>>> kind of cluster, throw some Kafka brokers on there, and you should be >>>> good to go. You can also re-use your existing ZK, if you wish. >>>> >>>> Cheers, >>>> Chris >>>> >>>> On 2/3/14 10:42 AM, "[email protected]" >>>> <[email protected]> wrote: >>>> >>>>> Thanks Chris/Gary. >>>>> >>>>> I have an existing Zookeeper and YARN Cluster. However, the YARN >>>>> version that I have (that came preinstalled with Pivotal HD) is >>>>>2.0.5. >>>>> So from what you're saying I cannot reuse it for my Samza deployment. >>>>> >>>>> So then my option is: >>>>> 1. Reuse zookeeper. So I'll have to configure Samza to point to the >>>>> right cluster 2. Run Samza with its YARN grid and Kafka Installation >>>>> (I can do this on multiple servers right? 1 RM, 2 NM kind of >>>>> situation) >>>>> >>>>> Thanks, >>>>> Sonali >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Chris Riccomini [mailto:[email protected]] >>>>> Sent: Friday, January 31, 2014 11:24 AM >>>>> To: [email protected] >>>>> Subject: Re: Cluster Installation >>>>> >>>>> Hey Sonali, >>>>> >>>>> Everything Gary said is correct. >>>>> >>>>> One other item of note is that if you're interested in running stuff >>>>> locally in a dev-mode fashion, you don't need YARN. You can use the >>>>> LocalJobFactory instead of the YarnJobFactory factory when >>>>>configuring >>>>> your job's "job.factory.class" setting. >>>>> >>>>> For "real" deployments, yes you'll need YARN, ZooKeeper, and Kafka. >>>>> They can be deployed using any standard way of shipping software >>>>> around to a cluster of machines. >>>>> >>>>> Cheers, >>>>> Chris >>>>> >>>>> On 1/31/14 12:58 AM, "Garry Turkington" >>>>> <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Sonali, >>>>>> >>>>>> This was something that I had some questions about originally as >>>>>>well. >>>>>> In terms of required components then yes, for any size of Samza >>>>>> deployment you will need all those pieces. >>>>>> >>>>>> In terms of actual deployment, from what I understand from the >>>>>> LinkedIn guys they do run Samza on a dedicated YARN grid that also >>>>>> has a Kafka broker collocated on each node. These decisions though >>>>>> appear to be more down to convenience than a hard requirement. >>>>>> >>>>>> In my own setup I have existing ZooKeeper and Kafka clusters that >>>>>>I'm >>>>>> pointing Samza at but do need to run a dedicated YARN grid because >>>>>>my >>>>>> Hadoop cluster has a pre-2.2 version of YARN running on it. >>>>>> >>>>>> So if you have existing components you can reuse them, if not then >>>>>> repurposing the Hello Samza package is a good starting point to get >>>>>> all the things you want on the required hosts. Only caveat would be >>>>>> to not drop a ZK node on each host, the ZK quorum should follow the >>>>>> usual advice of an odd number of servers and likely no more than 3, >>>>>>5 >>>>>> or 7 depending on your deployment size. >>>>>> >>>>>> Garry >>>>>> >>>>>> -----Original Message----- >>>>>> From: [email protected] >>>>>> [mailto:[email protected]] >>>>>> Sent: 30 January 2014 23:38 >>>>>> To: [email protected] >>>>>> Subject: Cluster Installation >>>>>> >>>>>> Hi All, >>>>>> >>>>>> I'm new to working with Samza and have been trying to figure out the >>>>>> best cluster configuration. I understand that Samza comes with >>>>>> yarn,kafka and zookeeper out of the box. Is that the model just for >>>>>>a >>>>>> standalone/local configuration. What if I want a bigger cluster? Do >>>>>>I >>>>>> have to install yarn, kafka and zookeeper separately? Any >>>>>>suggestions >>>>>> would be great! >>>>>> >>>>>> Thanks, >>>>>> Sonali >>>>>> >>>>>> Sonali Parthasarathy >>>>>> R&D Developer, Data Insights >>>>>> Accenture Technology Labs >>>>>> 703-341-7432 >>>>>> >>>>>> >>>>>> ________________________________ >>>>>> >>>>>> This message is for the designated recipient only and may contain >>>>>> privileged, proprietary, or otherwise confidential information. If >>>>>> you have received it in error, please notify the sender immediately >>>>>> and delete the original. Any other use of the e-mail by you is >>>>>> prohibited. >>>>>> Where allowed by local law, electronic communications with Accenture >>>>>> and its affiliates, including e-mail and instant messaging >>>>>>(including >>>>>> content), may be scanned by our systems for the purposes of >>>>>> information security and assessment of internal compliance with >>>>>> Accenture policy. . >>>>>> >>>>>>_____________________________________________________________________ >>>>>> _ >>>>>> _ >>>>>> ___ >>>>>> ____________ >>>>>> >>>>>> www.accenture.com >>>>>> >>>>>> ----- >>>>>> No virus found in this message. >>>>>> Checked by AVG - www.avg.com >>>>>> Version: 2014.0.4259 / Virus Database: 3684/7046 - Release Date: >>>>>> 01/30/14 >>>>> >>>>> >>>>> >>>>> ________________________________ >>>>> >>>>> This message is for the designated recipient only and may contain >>>>> privileged, proprietary, or otherwise confidential information. If >>>>>you >>>>> have received it in error, please notify the sender immediately and >>>>> delete the original. Any other use of the e-mail by you is >>>>>prohibited. >>>>> Where allowed by local law, electronic communications with Accenture >>>>> and its affiliates, including e-mail and instant messaging (including >>>>> content), may be scanned by our systems for the purposes of >>>>> information security and assessment of internal compliance with >>>>> Accenture policy. . >>>>> >>>>>______________________________________________________________________ >>>>> _ >>>>> ___ >>>>> ____________ >>>>> >>>>> www.accenture.com >>>>> >>>> >>>> >>>> >>>> ________________________________ >>>> >>>> This message is for the designated recipient only and may contain >>>> privileged, proprietary, or otherwise confidential information. If you >>>> have received it in error, please notify the sender immediately and >>>> delete the original. Any other use of the e-mail by you is prohibited. >>>> Where allowed by local law, electronic communications with Accenture >>>> and its affiliates, including e-mail and instant messaging (including >>>> content), may be scanned by our systems for the purposes of >>>>information >>>> security and assessment of internal compliance with Accenture policy. >>>>. >>>> >>>>_______________________________________________________________________ >>>> ___ >>>> ____________ >>>> >>>> www.accenture.com >>>> >>> >>> >>> >>> ________________________________ >>> >>> This message is for the designated recipient only and may contain >>> privileged, proprietary, or otherwise confidential information. If you >>> have received it in error, please notify the sender immediately and >>> delete the original. Any other use of the e-mail by you is prohibited. >>> Where allowed by local law, electronic communications with Accenture >>>and >>> its affiliates, including e-mail and instant messaging (including >>> content), may be scanned by our systems for the purposes of information >>> security and assessment of internal compliance with Accenture policy. . >>> >>>________________________________________________________________________ >>>__ >>> ____________ >>> >>> www.accenture.com >>> >> >
