Hey TJ,

The yarn-site.xml file is found via the YARN_HOME environment variable.
This variable must be set (export YARN_HOME=Š) when you start your NM.
>From there on out, everything gets access to it. When the AM creates a
YarnConfiguration, the object will load its values from the yarn-site.xml
(and use the YARN_HOME environment variable to find its location).

You should also verify that your yarn-site.xml for the NMs is
appropriately configured to point at the RM's host/port.

Also, when you go to the RM's webpage, do you see all of your Active Nodes
listed? (http://your-rm-host:port/cluster/nodes)

Cheers,
Chris

On 2/20/14 1:06 AM, "TJ Giuli" <[email protected]> wrote:

>Hi, to follow up on this thread of discussion, I¹ve got a three-node
>Cloudera CDH5 YARN cluster running and I¹m having some problems deploying
>Samza jobs on the grid.  All of the nodes are running a NodeManager and
>just one is running a ResourceManager.  If the ApplicationMaster is
>deployed to the node with the RM, everything is fine.  However, if the
>job is deployed to one of the other two hosts, the job fails.  Looking at
>the AM log (http://pastebin.com/VxbLiWST), the AM is trying to contact
>the cluster ResourceManager at 0.0.0.0:8030, which is a YARN default.
>Nothing is at 0.0.0.0, so the job eventually dies.
>
>It looks like yarn-site.xml is not being read by any component of the
>system and so it¹s failing back to the default value for the
>ResourceManager¹s address.  Looking at the code, it seems that
>org.apache.samza.job.yarn.SamzaAppMaster creates a new YarnConfiguration
>object and passes it to ClientHelper.  Is yarn-site.xml being read in
>somewhere?  Am I missing some key configuration?  Thanks!
>‹T
>
>On Feb 5, 2014, at 5:59 PM, Chris Riccomini <[email protected]>
>wrote:
>
>> Hey Sonali,
>> 
>> The next step you need to take is to build your Samza job package (the
>> .tgz file that contains bin and lib directories). Take a look at
>> hello-samza, which shows how to build a .tar.gz file with the
>>appropriate
>> files in it.
>> 
>> Once you have the .tar.gz file built, you need to publish it somewhere.
>> This can be HDFS or an HTTP server.
>> 
>> == IF YOU USE HDFS, SKIP THIS STEP ==
>> 
>> At LinkedIn, we use an HTTP server. The easiest way to hack this up for
>> testing is to start a local HTTP server on your developer box with
>>Python:
>> 
>>  python -m SimpleHTTPServer
>> 
>> This command will start a simple HTTP server serving files from the
>> current working directory. So, running that command from the directory
>> with your .tar.gz job package should work.
>> 
>> You then need to setup your NMs to be able to read HTTP files, since
>> Hadoop doesn't support an HTTP-based file system implementation out of
>>the
>> box. Fortunately, Samza ships with one. To use it, you need to do two
>> things:
>> 
>> First, add this to your NM's core-site.xml:
>> 
>> <configuration>
>>  <property>
>>    <name>fs.http.impl</name>
>>    <value>org.apache.samza.util.hadoop.HttpFileSystem</value>
>>  </property>
>> </configuration>
>> 
>> Second, make sure that you put the following jars into your NM's class
>> path:
>> 
>> 
>> * grizzled-slf4j
>> * samza-yarn
>> * scala-compiler
>> * scala-library
>> 
>> Make sure that all of these libraries match the same version of Scala
>>that
>> samza-yarn was built with.
>> 
>> The easiest way to add everything to your NM's class path is to put the
>> files in the lib directory:
>> 
>>  hadoop-2.2.0/share/hadoop/hdfs/lib
>> 
>> == END OF "IF YOU USE HDFS, SKIP THIS STEP" SECTION ==
>> 
>> 
>> Now, you should have a .tar.gz file with a URI that's either:
>> 
>>  hdfs://foo/bar/your-job-package.tar.gz
>> 
>> Or:
>> 
>>  http://192.168.0.1/your-job-package.tar.gz
>> 
>> This path (either the HDFS or HTTP one, depending on which you chose to
>> use) is what you should set your yarn.package.path configuration
>>parameter
>> to in your job's configuration file.
>> 
>>  yarn.package.path=http://192.168.0.1/your-job-package.tar.gz
>> 
>> This tells YARN's NMs where to download your job package from when YARN
>> begins running it in the grid.
>> 
>> Finally, you'll want to start your job!
>> 
>> 1. Make sure that you're using the YarnJobRunner for your
>> job.factory.class configuration setting (see hello-samza for an
>>example).
>> 2. Get a copy of one of your NM's yarn-site.xml and put it somewhere on
>> your desktop (I usually use ~/.yarn/conf/yarn-site.xml). Note that
>>there's
>> a "conf" directory there. This is mandatory.
>> 3. Setup an environment variable called YARN_HOME that points to the
>> directory that has "conf" directory in it:
>> 
>>  export YARN_HOME=~/.yarn
>> 
>> 4. Execute your job with run-job.sh (see
>> http://samza.incubator.apache.org/startup/hello-samza/0.7.0/ for an
>> example).
>> 
>> This should start the job on your YARN grid.
>> 
>> Cheers,
>> Chris
>> 
>> On 2/5/14 5:40 PM, "[email protected]"
>> <[email protected]> wrote:
>> 
>>> Hi Chris,
>>> 
>>> So this is what I have now:
>>> 1.  YARN-Cluster with 1 RM and 2NMs
>>> 2.  Kafka broker running on each NM
>>> 3.  Zookeeper running on the RM
>>> 4. I downloaded and published(gradlew) the incubator-samza project.
>>>It's
>>> in my /root/m2 repository ready to be used by my project(when I create
>>> one)
>>> 
>>> Where do I go from here? How do I get Samza to point to this setup
>>> exactly?
>>> 
>>> Thanks,
>>> Sonali
>>> 
>>> -----Original Message-----
>>> From: Chris Riccomini [mailto:[email protected]]
>>> Sent: Monday, February 03, 2014 12:10 PM
>>> To: [email protected]
>>> Subject: Re: Cluster Installation
>>> 
>>> Hey Sonali,
>>> 
>>> You will need to setup separately in order to configure your
>>> yarn-site.xml files for the NMs to point to the RM's host/port. They
>>> default to localhost, which is what hello-samza is using.
>>> 
>>> On the Kafka side, the same things applies- you'll need to configure
>>>each
>>> broker with a unique broker id, etc.
>>> 
>>> Cheers,
>>> Chris
>>> 
>>> On 2/3/14 11:25 AM, "[email protected]"
>>> <[email protected]> wrote:
>>> 
>>>> Ah, makes sense
>>>> 
>>>> So to have a cluster setup with RM and NMs running on different nodes,
>>>> Can I reuse the "grid" script from "hello-samza"? or will I have to do
>>>> the setup separately and then change the config files on samza?
>>>> 
>>>> Thanks,
>>>> Sonali
>>>> 
>>>> -----Original Message-----
>>>> From: Chris Riccomini [mailto:[email protected]]
>>>> Sent: Monday, February 03, 2014 11:02 AM
>>>> To: [email protected]
>>>> Subject: Re: Cluster Installation
>>>> 
>>>> Hey Sonali,
>>>> 
>>>> I believe the point at which YARN became version compatible for 2.* as
>>>> at 2.1.0-beta. I believe 2.0.5 is not API compatible with later
>>>> versions of YARN (e.g. 2.2). For this reason, you'll need to upgrade
>>>> your YARN grid, or use a different one with a higher version.
>>>> 
>>>> For its part, Samza should work with YARN grids 2.1.0-beta and beyond,
>>>> though I haven't tested this. The YARN community has given a
>>>>commitment
>>>> to maintaining API compatibility going forward for YARN 2.*, which
>>>> means that future upgrades should not be required, until YARN 3 comes
>>>> out.
>>>> 
>>>> The rest of your understanding is correct. You can run a 1 RM, 2 NM
>>>> kind of cluster, throw some Kafka brokers on there, and you should be
>>>> good to go. You can also re-use your existing ZK, if you wish.
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> On 2/3/14 10:42 AM, "[email protected]"
>>>> <[email protected]> wrote:
>>>> 
>>>>> Thanks Chris/Gary.
>>>>> 
>>>>> I have an existing Zookeeper and YARN Cluster. However, the YARN
>>>>> version that I have (that came preinstalled with Pivotal HD) is
>>>>>2.0.5.
>>>>> So from what you're saying I cannot reuse it for my Samza deployment.
>>>>> 
>>>>> So then my option is:
>>>>> 1. Reuse zookeeper. So I'll have to configure Samza to point to the
>>>>> right cluster 2. Run Samza with its YARN grid and Kafka Installation
>>>>> (I can do this on multiple servers right? 1 RM, 2 NM kind of
>>>>> situation)
>>>>> 
>>>>> Thanks,
>>>>> Sonali
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Chris Riccomini [mailto:[email protected]]
>>>>> Sent: Friday, January 31, 2014 11:24 AM
>>>>> To: [email protected]
>>>>> Subject: Re: Cluster Installation
>>>>> 
>>>>> Hey Sonali,
>>>>> 
>>>>> Everything Gary said is correct.
>>>>> 
>>>>> One other item of note is that if you're interested in running stuff
>>>>> locally in a dev-mode fashion, you don't need YARN. You can use the
>>>>> LocalJobFactory instead of the YarnJobFactory factory when
>>>>>configuring
>>>>> your job's "job.factory.class" setting.
>>>>> 
>>>>> For "real" deployments, yes you'll need YARN, ZooKeeper, and Kafka.
>>>>> They can be deployed using any standard way of shipping software
>>>>> around to a cluster of machines.
>>>>> 
>>>>> Cheers,
>>>>> Chris
>>>>> 
>>>>> On 1/31/14 12:58 AM, "Garry Turkington"
>>>>> <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> Hi Sonali,
>>>>>> 
>>>>>> This was something that I had some questions about originally as
>>>>>>well.
>>>>>> In terms of required components then yes, for any size of Samza
>>>>>> deployment you will  need all those pieces.
>>>>>> 
>>>>>> In terms of actual deployment, from what I understand from the
>>>>>> LinkedIn guys they do run Samza on a dedicated YARN grid that also
>>>>>> has a Kafka broker collocated on each node. These decisions though
>>>>>> appear to be more down to convenience than a hard requirement.
>>>>>> 
>>>>>> In my own setup I have existing ZooKeeper and Kafka clusters that
>>>>>>I'm
>>>>>> pointing Samza at but do need to run a dedicated YARN grid because
>>>>>>my
>>>>>> Hadoop cluster has a pre-2.2 version of YARN running on it.
>>>>>> 
>>>>>> So if you have existing components you can reuse them, if not then
>>>>>> repurposing the Hello Samza package is a good starting point to get
>>>>>> all the things you want on the required hosts. Only caveat would be
>>>>>> to not drop a ZK node on each host, the ZK quorum should follow the
>>>>>> usual advice of an odd number of servers and likely no more than 3,
>>>>>>5
>>>>>> or 7 depending on your deployment size.
>>>>>> 
>>>>>> Garry
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: [email protected]
>>>>>> [mailto:[email protected]]
>>>>>> Sent: 30 January 2014 23:38
>>>>>> To: [email protected]
>>>>>> Subject: Cluster Installation
>>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> I'm new to working with Samza and have been trying to figure out the
>>>>>> best cluster configuration. I understand that Samza comes with
>>>>>> yarn,kafka and zookeeper out of the box. Is that the model just for
>>>>>>a
>>>>>> standalone/local configuration. What if I want a bigger cluster? Do
>>>>>>I
>>>>>> have to install yarn, kafka and zookeeper separately? Any
>>>>>>suggestions
>>>>>> would be great!
>>>>>> 
>>>>>> Thanks,
>>>>>> Sonali
>>>>>> 
>>>>>> Sonali Parthasarathy
>>>>>> R&D Developer, Data Insights
>>>>>> Accenture Technology Labs
>>>>>> 703-341-7432
>>>>>> 
>>>>>> 
>>>>>> ________________________________
>>>>>> 
>>>>>> This message is for the designated recipient only and may contain
>>>>>> privileged, proprietary, or otherwise confidential information. If
>>>>>> you have received it in error, please notify the sender immediately
>>>>>> and delete the original. Any other use of the e-mail by you is
>>>>>> prohibited.
>>>>>> Where allowed by local law, electronic communications with Accenture
>>>>>> and its affiliates, including e-mail and instant messaging
>>>>>>(including
>>>>>> content), may be scanned by our systems for the purposes of
>>>>>> information security and assessment of internal compliance with
>>>>>> Accenture policy. .
>>>>>> 
>>>>>>_____________________________________________________________________
>>>>>> _
>>>>>> _
>>>>>> ___
>>>>>> ____________
>>>>>> 
>>>>>> www.accenture.com
>>>>>> 
>>>>>> -----
>>>>>> No virus found in this message.
>>>>>> Checked by AVG - www.avg.com
>>>>>> Version: 2014.0.4259 / Virus Database: 3684/7046 - Release Date:
>>>>>> 01/30/14
>>>>> 
>>>>> 
>>>>> 
>>>>> ________________________________
>>>>> 
>>>>> This message is for the designated recipient only and may contain
>>>>> privileged, proprietary, or otherwise confidential information. If
>>>>>you
>>>>> have received it in error, please notify the sender immediately and
>>>>> delete the original. Any other use of the e-mail by you is
>>>>>prohibited.
>>>>> Where allowed by local law, electronic communications with Accenture
>>>>> and its affiliates, including e-mail and instant messaging (including
>>>>> content), may be scanned by our systems for the purposes of
>>>>> information security and assessment of internal compliance with
>>>>> Accenture policy. .
>>>>> 
>>>>>______________________________________________________________________
>>>>> _
>>>>> ___
>>>>> ____________
>>>>> 
>>>>> www.accenture.com
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> ________________________________
>>>> 
>>>> This message is for the designated recipient only and may contain
>>>> privileged, proprietary, or otherwise confidential information. If you
>>>> have received it in error, please notify the sender immediately and
>>>> delete the original. Any other use of the e-mail by you is prohibited.
>>>> Where allowed by local law, electronic communications with Accenture
>>>> and its affiliates, including e-mail and instant messaging (including
>>>> content), may be scanned by our systems for the purposes of
>>>>information
>>>> security and assessment of internal compliance with Accenture policy.
>>>>.
>>>> 
>>>>_______________________________________________________________________
>>>> ___
>>>> ____________
>>>> 
>>>> www.accenture.com
>>>> 
>>> 
>>> 
>>> 
>>> ________________________________
>>> 
>>> This message is for the designated recipient only and may contain
>>> privileged, proprietary, or otherwise confidential information. If you
>>> have received it in error, please notify the sender immediately and
>>> delete the original. Any other use of the e-mail by you is prohibited.
>>> Where allowed by local law, electronic communications with Accenture
>>>and
>>> its affiliates, including e-mail and instant messaging (including
>>> content), may be scanned by our systems for the purposes of information
>>> security and assessment of internal compliance with Accenture policy. .
>>> 
>>>________________________________________________________________________
>>>__
>>> ____________
>>> 
>>> www.accenture.com
>>> 
>> 
>

Reply via email to