Hey TJ, Also, for reference, here's an example yarn-site.xml:
http://pastebin.com/6B90YbQh Cheers, Chris On 2/20/14 9:16 AM, "Chris Riccomini" <[email protected]> wrote: >Hey TJ, > >The yarn-site.xml file is found via the YARN_HOME environment variable. >This variable must be set (export YARN_HOME=Š) when you start your NM. >From there on out, everything gets access to it. When the AM creates a >YarnConfiguration, the object will load its values from the yarn-site.xml >(and use the YARN_HOME environment variable to find its location). > >You should also verify that your yarn-site.xml for the NMs is >appropriately configured to point at the RM's host/port. > >Also, when you go to the RM's webpage, do you see all of your Active Nodes >listed? (http://your-rm-host:port/cluster/nodes) > >Cheers, >Chris > >On 2/20/14 1:06 AM, "TJ Giuli" <[email protected]> wrote: > >>Hi, to follow up on this thread of discussion, I¹ve got a three-node >>Cloudera CDH5 YARN cluster running and I¹m having some problems deploying >>Samza jobs on the grid. All of the nodes are running a NodeManager and >>just one is running a ResourceManager. If the ApplicationMaster is >>deployed to the node with the RM, everything is fine. However, if the >>job is deployed to one of the other two hosts, the job fails. Looking at >>the AM log (http://pastebin.com/VxbLiWST), the AM is trying to contact >>the cluster ResourceManager at 0.0.0.0:8030, which is a YARN default. >>Nothing is at 0.0.0.0, so the job eventually dies. >> >>It looks like yarn-site.xml is not being read by any component of the >>system and so it¹s failing back to the default value for the >>ResourceManager¹s address. Looking at the code, it seems that >>org.apache.samza.job.yarn.SamzaAppMaster creates a new YarnConfiguration >>object and passes it to ClientHelper. Is yarn-site.xml being read in >>somewhere? Am I missing some key configuration? Thanks! >>‹T >> >>On Feb 5, 2014, at 5:59 PM, Chris Riccomini <[email protected]> >>wrote: >> >>> Hey Sonali, >>> >>> The next step you need to take is to build your Samza job package (the >>> .tgz file that contains bin and lib directories). Take a look at >>> hello-samza, which shows how to build a .tar.gz file with the >>>appropriate >>> files in it. >>> >>> Once you have the .tar.gz file built, you need to publish it somewhere. >>> This can be HDFS or an HTTP server. >>> >>> == IF YOU USE HDFS, SKIP THIS STEP == >>> >>> At LinkedIn, we use an HTTP server. The easiest way to hack this up for >>> testing is to start a local HTTP server on your developer box with >>>Python: >>> >>> python -m SimpleHTTPServer >>> >>> This command will start a simple HTTP server serving files from the >>> current working directory. So, running that command from the directory >>> with your .tar.gz job package should work. >>> >>> You then need to setup your NMs to be able to read HTTP files, since >>> Hadoop doesn't support an HTTP-based file system implementation out of >>>the >>> box. Fortunately, Samza ships with one. To use it, you need to do two >>> things: >>> >>> First, add this to your NM's core-site.xml: >>> >>> <configuration> >>> <property> >>> <name>fs.http.impl</name> >>> <value>org.apache.samza.util.hadoop.HttpFileSystem</value> >>> </property> >>> </configuration> >>> >>> Second, make sure that you put the following jars into your NM's class >>> path: >>> >>> >>> * grizzled-slf4j >>> * samza-yarn >>> * scala-compiler >>> * scala-library >>> >>> Make sure that all of these libraries match the same version of Scala >>>that >>> samza-yarn was built with. >>> >>> The easiest way to add everything to your NM's class path is to put the >>> files in the lib directory: >>> >>> hadoop-2.2.0/share/hadoop/hdfs/lib >>> >>> == END OF "IF YOU USE HDFS, SKIP THIS STEP" SECTION == >>> >>> >>> Now, you should have a .tar.gz file with a URI that's either: >>> >>> hdfs://foo/bar/your-job-package.tar.gz >>> >>> Or: >>> >>> http://192.168.0.1/your-job-package.tar.gz >>> >>> This path (either the HDFS or HTTP one, depending on which you chose to >>> use) is what you should set your yarn.package.path configuration >>>parameter >>> to in your job's configuration file. >>> >>> yarn.package.path=http://192.168.0.1/your-job-package.tar.gz >>> >>> This tells YARN's NMs where to download your job package from when YARN >>> begins running it in the grid. >>> >>> Finally, you'll want to start your job! >>> >>> 1. Make sure that you're using the YarnJobRunner for your >>> job.factory.class configuration setting (see hello-samza for an >>>example). >>> 2. Get a copy of one of your NM's yarn-site.xml and put it somewhere on >>> your desktop (I usually use ~/.yarn/conf/yarn-site.xml). Note that >>>there's >>> a "conf" directory there. This is mandatory. >>> 3. Setup an environment variable called YARN_HOME that points to the >>> directory that has "conf" directory in it: >>> >>> export YARN_HOME=~/.yarn >>> >>> 4. Execute your job with run-job.sh (see >>> http://samza.incubator.apache.org/startup/hello-samza/0.7.0/ for an >>> example). >>> >>> This should start the job on your YARN grid. >>> >>> Cheers, >>> Chris >>> >>> On 2/5/14 5:40 PM, "[email protected]" >>> <[email protected]> wrote: >>> >>>> Hi Chris, >>>> >>>> So this is what I have now: >>>> 1. YARN-Cluster with 1 RM and 2NMs >>>> 2. Kafka broker running on each NM >>>> 3. Zookeeper running on the RM >>>> 4. I downloaded and published(gradlew) the incubator-samza project. >>>>It's >>>> in my /root/m2 repository ready to be used by my project(when I create >>>> one) >>>> >>>> Where do I go from here? How do I get Samza to point to this setup >>>> exactly? >>>> >>>> Thanks, >>>> Sonali >>>> >>>> -----Original Message----- >>>> From: Chris Riccomini [mailto:[email protected]] >>>> Sent: Monday, February 03, 2014 12:10 PM >>>> To: [email protected] >>>> Subject: Re: Cluster Installation >>>> >>>> Hey Sonali, >>>> >>>> You will need to setup separately in order to configure your >>>> yarn-site.xml files for the NMs to point to the RM's host/port. They >>>> default to localhost, which is what hello-samza is using. >>>> >>>> On the Kafka side, the same things applies- you'll need to configure >>>>each >>>> broker with a unique broker id, etc. >>>> >>>> Cheers, >>>> Chris >>>> >>>> On 2/3/14 11:25 AM, "[email protected]" >>>> <[email protected]> wrote: >>>> >>>>> Ah, makes sense >>>>> >>>>> So to have a cluster setup with RM and NMs running on different >>>>>nodes, >>>>> Can I reuse the "grid" script from "hello-samza"? or will I have to >>>>>do >>>>> the setup separately and then change the config files on samza? >>>>> >>>>> Thanks, >>>>> Sonali >>>>> >>>>> -----Original Message----- >>>>> From: Chris Riccomini [mailto:[email protected]] >>>>> Sent: Monday, February 03, 2014 11:02 AM >>>>> To: [email protected] >>>>> Subject: Re: Cluster Installation >>>>> >>>>> Hey Sonali, >>>>> >>>>> I believe the point at which YARN became version compatible for 2.* >>>>>as >>>>> at 2.1.0-beta. I believe 2.0.5 is not API compatible with later >>>>> versions of YARN (e.g. 2.2). For this reason, you'll need to upgrade >>>>> your YARN grid, or use a different one with a higher version. >>>>> >>>>> For its part, Samza should work with YARN grids 2.1.0-beta and >>>>>beyond, >>>>> though I haven't tested this. The YARN community has given a >>>>>commitment >>>>> to maintaining API compatibility going forward for YARN 2.*, which >>>>> means that future upgrades should not be required, until YARN 3 comes >>>>> out. >>>>> >>>>> The rest of your understanding is correct. You can run a 1 RM, 2 NM >>>>> kind of cluster, throw some Kafka brokers on there, and you should be >>>>> good to go. You can also re-use your existing ZK, if you wish. >>>>> >>>>> Cheers, >>>>> Chris >>>>> >>>>> On 2/3/14 10:42 AM, "[email protected]" >>>>> <[email protected]> wrote: >>>>> >>>>>> Thanks Chris/Gary. >>>>>> >>>>>> I have an existing Zookeeper and YARN Cluster. However, the YARN >>>>>> version that I have (that came preinstalled with Pivotal HD) is >>>>>>2.0.5. >>>>>> So from what you're saying I cannot reuse it for my Samza >>>>>>deployment. >>>>>> >>>>>> So then my option is: >>>>>> 1. Reuse zookeeper. So I'll have to configure Samza to point to the >>>>>> right cluster 2. Run Samza with its YARN grid and Kafka Installation >>>>>> (I can do this on multiple servers right? 1 RM, 2 NM kind of >>>>>> situation) >>>>>> >>>>>> Thanks, >>>>>> Sonali >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Chris Riccomini [mailto:[email protected]] >>>>>> Sent: Friday, January 31, 2014 11:24 AM >>>>>> To: [email protected] >>>>>> Subject: Re: Cluster Installation >>>>>> >>>>>> Hey Sonali, >>>>>> >>>>>> Everything Gary said is correct. >>>>>> >>>>>> One other item of note is that if you're interested in running stuff >>>>>> locally in a dev-mode fashion, you don't need YARN. You can use the >>>>>> LocalJobFactory instead of the YarnJobFactory factory when >>>>>>configuring >>>>>> your job's "job.factory.class" setting. >>>>>> >>>>>> For "real" deployments, yes you'll need YARN, ZooKeeper, and Kafka. >>>>>> They can be deployed using any standard way of shipping software >>>>>> around to a cluster of machines. >>>>>> >>>>>> Cheers, >>>>>> Chris >>>>>> >>>>>> On 1/31/14 12:58 AM, "Garry Turkington" >>>>>> <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Sonali, >>>>>>> >>>>>>> This was something that I had some questions about originally as >>>>>>>well. >>>>>>> In terms of required components then yes, for any size of Samza >>>>>>> deployment you will need all those pieces. >>>>>>> >>>>>>> In terms of actual deployment, from what I understand from the >>>>>>> LinkedIn guys they do run Samza on a dedicated YARN grid that also >>>>>>> has a Kafka broker collocated on each node. These decisions though >>>>>>> appear to be more down to convenience than a hard requirement. >>>>>>> >>>>>>> In my own setup I have existing ZooKeeper and Kafka clusters that >>>>>>>I'm >>>>>>> pointing Samza at but do need to run a dedicated YARN grid because >>>>>>>my >>>>>>> Hadoop cluster has a pre-2.2 version of YARN running on it. >>>>>>> >>>>>>> So if you have existing components you can reuse them, if not then >>>>>>> repurposing the Hello Samza package is a good starting point to get >>>>>>> all the things you want on the required hosts. Only caveat would be >>>>>>> to not drop a ZK node on each host, the ZK quorum should follow the >>>>>>> usual advice of an odd number of servers and likely no more than 3, >>>>>>>5 >>>>>>> or 7 depending on your deployment size. >>>>>>> >>>>>>> Garry >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: [email protected] >>>>>>> [mailto:[email protected]] >>>>>>> Sent: 30 January 2014 23:38 >>>>>>> To: [email protected] >>>>>>> Subject: Cluster Installation >>>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> I'm new to working with Samza and have been trying to figure out >>>>>>>the >>>>>>> best cluster configuration. I understand that Samza comes with >>>>>>> yarn,kafka and zookeeper out of the box. Is that the model just for >>>>>>>a >>>>>>> standalone/local configuration. What if I want a bigger cluster? Do >>>>>>>I >>>>>>> have to install yarn, kafka and zookeeper separately? Any >>>>>>>suggestions >>>>>>> would be great! >>>>>>> >>>>>>> Thanks, >>>>>>> Sonali >>>>>>> >>>>>>> Sonali Parthasarathy >>>>>>> R&D Developer, Data Insights >>>>>>> Accenture Technology Labs >>>>>>> 703-341-7432 >>>>>>> >>>>>>> >>>>>>> ________________________________ >>>>>>> >>>>>>> This message is for the designated recipient only and may contain >>>>>>> privileged, proprietary, or otherwise confidential information. If >>>>>>> you have received it in error, please notify the sender immediately >>>>>>> and delete the original. Any other use of the e-mail by you is >>>>>>> prohibited. >>>>>>> Where allowed by local law, electronic communications with >>>>>>>Accenture >>>>>>> and its affiliates, including e-mail and instant messaging >>>>>>>(including >>>>>>> content), may be scanned by our systems for the purposes of >>>>>>> information security and assessment of internal compliance with >>>>>>> Accenture policy. . >>>>>>> >>>>>>>____________________________________________________________________ >>>>>>>_ >>>>>>> _ >>>>>>> _ >>>>>>> ___ >>>>>>> ____________ >>>>>>> >>>>>>> www.accenture.com >>>>>>> >>>>>>> ----- >>>>>>> No virus found in this message. >>>>>>> Checked by AVG - www.avg.com >>>>>>> Version: 2014.0.4259 / Virus Database: 3684/7046 - Release Date: >>>>>>> 01/30/14 >>>>>> >>>>>> >>>>>> >>>>>> ________________________________ >>>>>> >>>>>> This message is for the designated recipient only and may contain >>>>>> privileged, proprietary, or otherwise confidential information. If >>>>>>you >>>>>> have received it in error, please notify the sender immediately and >>>>>> delete the original. Any other use of the e-mail by you is >>>>>>prohibited. >>>>>> Where allowed by local law, electronic communications with Accenture >>>>>> and its affiliates, including e-mail and instant messaging >>>>>>(including >>>>>> content), may be scanned by our systems for the purposes of >>>>>> information security and assessment of internal compliance with >>>>>> Accenture policy. . >>>>>> >>>>>>_____________________________________________________________________ >>>>>>_ >>>>>> _ >>>>>> ___ >>>>>> ____________ >>>>>> >>>>>> www.accenture.com >>>>>> >>>>> >>>>> >>>>> >>>>> ________________________________ >>>>> >>>>> This message is for the designated recipient only and may contain >>>>> privileged, proprietary, or otherwise confidential information. If >>>>>you >>>>> have received it in error, please notify the sender immediately and >>>>> delete the original. Any other use of the e-mail by you is >>>>>prohibited. >>>>> Where allowed by local law, electronic communications with Accenture >>>>> and its affiliates, including e-mail and instant messaging (including >>>>> content), may be scanned by our systems for the purposes of >>>>>information >>>>> security and assessment of internal compliance with Accenture policy. >>>>>. >>>>> >>>>>______________________________________________________________________ >>>>>_ >>>>> ___ >>>>> ____________ >>>>> >>>>> www.accenture.com >>>>> >>>> >>>> >>>> >>>> ________________________________ >>>> >>>> This message is for the designated recipient only and may contain >>>> privileged, proprietary, or otherwise confidential information. If you >>>> have received it in error, please notify the sender immediately and >>>> delete the original. Any other use of the e-mail by you is prohibited. >>>> Where allowed by local law, electronic communications with Accenture >>>>and >>>> its affiliates, including e-mail and instant messaging (including >>>> content), may be scanned by our systems for the purposes of >>>>information >>>> security and assessment of internal compliance with Accenture policy. >>>>. >>>> >>>>_______________________________________________________________________ >>>>_ >>>>__ >>>> ____________ >>>> >>>> www.accenture.com >>>> >>> >> >
