Re: Cluster Installation

Chris Riccomini Thu, 20 Feb 2014 09:23:33 -0800

Hey TJ,

Also, for reference, here's an example yarn-site.xml:


  http://pastebin.com/6B90YbQh

Cheers,
Chris

On 2/20/14 9:16 AM, "Chris Riccomini" <[email protected]> wrote:

>Hey TJ,
>
>The yarn-site.xml file is found via the YARN_HOME environment variable.
>This variable must be set (export YARN_HOME=Š) when you start your NM.
>From there on out, everything gets access to it. When the AM creates a
>YarnConfiguration, the object will load its values from the yarn-site.xml
>(and use the YARN_HOME environment variable to find its location).
>
>You should also verify that your yarn-site.xml for the NMs is
>appropriately configured to point at the RM's host/port.
>
>Also, when you go to the RM's webpage, do you see all of your Active Nodes
>listed? (http://your-rm-host:port/cluster/nodes)
>
>Cheers,
>Chris
>
>On 2/20/14 1:06 AM, "TJ Giuli" <[email protected]> wrote:
>
>>Hi, to follow up on this thread of discussion, I¹ve got a three-node
>>Cloudera CDH5 YARN cluster running and I¹m having some problems deploying
>>Samza jobs on the grid.  All of the nodes are running a NodeManager and
>>just one is running a ResourceManager.  If the ApplicationMaster is
>>deployed to the node with the RM, everything is fine.  However, if the
>>job is deployed to one of the other two hosts, the job fails.  Looking at
>>the AM log (http://pastebin.com/VxbLiWST), the AM is trying to contact
>>the cluster ResourceManager at 0.0.0.0:8030, which is a YARN default.
>>Nothing is at 0.0.0.0, so the job eventually dies.
>>
>>It looks like yarn-site.xml is not being read by any component of the
>>system and so it¹s failing back to the default value for the
>>ResourceManager¹s address.  Looking at the code, it seems that
>>org.apache.samza.job.yarn.SamzaAppMaster creates a new YarnConfiguration
>>object and passes it to ClientHelper.  Is yarn-site.xml being read in
>>somewhere?  Am I missing some key configuration?  Thanks!
>>‹T
>>
>>On Feb 5, 2014, at 5:59 PM, Chris Riccomini <[email protected]>
>>wrote:
>>
>>> Hey Sonali,
>>> 
>>> The next step you need to take is to build your Samza job package (the
>>> .tgz file that contains bin and lib directories). Take a look at
>>> hello-samza, which shows how to build a .tar.gz file with the
>>>appropriate
>>> files in it.
>>> 
>>> Once you have the .tar.gz file built, you need to publish it somewhere.
>>> This can be HDFS or an HTTP server.
>>> 
>>> == IF YOU USE HDFS, SKIP THIS STEP ==
>>> 
>>> At LinkedIn, we use an HTTP server. The easiest way to hack this up for
>>> testing is to start a local HTTP server on your developer box with
>>>Python:
>>> 
>>>  python -m SimpleHTTPServer
>>> 
>>> This command will start a simple HTTP server serving files from the
>>> current working directory. So, running that command from the directory
>>> with your .tar.gz job package should work.
>>> 
>>> You then need to setup your NMs to be able to read HTTP files, since
>>> Hadoop doesn't support an HTTP-based file system implementation out of
>>>the
>>> box. Fortunately, Samza ships with one. To use it, you need to do two
>>> things:
>>> 
>>> First, add this to your NM's core-site.xml:
>>> 
>>> <configuration>
>>>  <property>
>>>    <name>fs.http.impl</name>
>>>    <value>org.apache.samza.util.hadoop.HttpFileSystem</value>
>>>  </property>
>>> </configuration>
>>> 
>>> Second, make sure that you put the following jars into your NM's class
>>> path:
>>> 
>>> 
>>> * grizzled-slf4j
>>> * samza-yarn
>>> * scala-compiler
>>> * scala-library
>>> 
>>> Make sure that all of these libraries match the same version of Scala
>>>that
>>> samza-yarn was built with.
>>> 
>>> The easiest way to add everything to your NM's class path is to put the
>>> files in the lib directory:
>>> 
>>>  hadoop-2.2.0/share/hadoop/hdfs/lib
>>> 
>>> == END OF "IF YOU USE HDFS, SKIP THIS STEP" SECTION ==
>>> 
>>> 
>>> Now, you should have a .tar.gz file with a URI that's either:
>>> 
>>>  hdfs://foo/bar/your-job-package.tar.gz
>>> 
>>> Or:
>>> 
>>>  http://192.168.0.1/your-job-package.tar.gz
>>> 
>>> This path (either the HDFS or HTTP one, depending on which you chose to
>>> use) is what you should set your yarn.package.path configuration
>>>parameter
>>> to in your job's configuration file.
>>> 
>>>  yarn.package.path=http://192.168.0.1/your-job-package.tar.gz
>>> 
>>> This tells YARN's NMs where to download your job package from when YARN
>>> begins running it in the grid.
>>> 
>>> Finally, you'll want to start your job!
>>> 
>>> 1. Make sure that you're using the YarnJobRunner for your
>>> job.factory.class configuration setting (see hello-samza for an
>>>example).
>>> 2. Get a copy of one of your NM's yarn-site.xml and put it somewhere on
>>> your desktop (I usually use ~/.yarn/conf/yarn-site.xml). Note that
>>>there's
>>> a "conf" directory there. This is mandatory.
>>> 3. Setup an environment variable called YARN_HOME that points to the
>>> directory that has "conf" directory in it:
>>> 
>>>  export YARN_HOME=~/.yarn
>>> 
>>> 4. Execute your job with run-job.sh (see
>>> http://samza.incubator.apache.org/startup/hello-samza/0.7.0/ for an
>>> example).
>>> 
>>> This should start the job on your YARN grid.
>>> 
>>> Cheers,
>>> Chris
>>> 
>>> On 2/5/14 5:40 PM, "[email protected]"
>>> <[email protected]> wrote:
>>> 
>>>> Hi Chris,
>>>> 
>>>> So this is what I have now:
>>>> 1.  YARN-Cluster with 1 RM and 2NMs
>>>> 2.  Kafka broker running on each NM
>>>> 3.  Zookeeper running on the RM
>>>> 4. I downloaded and published(gradlew) the incubator-samza project.
>>>>It's
>>>> in my /root/m2 repository ready to be used by my project(when I create
>>>> one)
>>>> 
>>>> Where do I go from here? How do I get Samza to point to this setup
>>>> exactly?
>>>> 
>>>> Thanks,
>>>> Sonali
>>>> 
>>>> -----Original Message-----
>>>> From: Chris Riccomini [mailto:[email protected]]
>>>> Sent: Monday, February 03, 2014 12:10 PM
>>>> To: [email protected]
>>>> Subject: Re: Cluster Installation
>>>> 
>>>> Hey Sonali,
>>>> 
>>>> You will need to setup separately in order to configure your
>>>> yarn-site.xml files for the NMs to point to the RM's host/port. They
>>>> default to localhost, which is what hello-samza is using.
>>>> 
>>>> On the Kafka side, the same things applies- you'll need to configure
>>>>each
>>>> broker with a unique broker id, etc.
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> On 2/3/14 11:25 AM, "[email protected]"
>>>> <[email protected]> wrote:
>>>> 
>>>>> Ah, makes sense
>>>>> 
>>>>> So to have a cluster setup with RM and NMs running on different
>>>>>nodes,
>>>>> Can I reuse the "grid" script from "hello-samza"? or will I have to
>>>>>do
>>>>> the setup separately and then change the config files on samza?
>>>>> 
>>>>> Thanks,
>>>>> Sonali
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Chris Riccomini [mailto:[email protected]]
>>>>> Sent: Monday, February 03, 2014 11:02 AM
>>>>> To: [email protected]
>>>>> Subject: Re: Cluster Installation
>>>>> 
>>>>> Hey Sonali,
>>>>> 
>>>>> I believe the point at which YARN became version compatible for 2.*
>>>>>as
>>>>> at 2.1.0-beta. I believe 2.0.5 is not API compatible with later
>>>>> versions of YARN (e.g. 2.2). For this reason, you'll need to upgrade
>>>>> your YARN grid, or use a different one with a higher version.
>>>>> 
>>>>> For its part, Samza should work with YARN grids 2.1.0-beta and
>>>>>beyond,
>>>>> though I haven't tested this. The YARN community has given a
>>>>>commitment
>>>>> to maintaining API compatibility going forward for YARN 2.*, which
>>>>> means that future upgrades should not be required, until YARN 3 comes
>>>>> out.
>>>>> 
>>>>> The rest of your understanding is correct. You can run a 1 RM, 2 NM
>>>>> kind of cluster, throw some Kafka brokers on there, and you should be
>>>>> good to go. You can also re-use your existing ZK, if you wish.
>>>>> 
>>>>> Cheers,
>>>>> Chris
>>>>> 
>>>>> On 2/3/14 10:42 AM, "[email protected]"
>>>>> <[email protected]> wrote:
>>>>> 
>>>>>> Thanks Chris/Gary.
>>>>>> 
>>>>>> I have an existing Zookeeper and YARN Cluster. However, the YARN
>>>>>> version that I have (that came preinstalled with Pivotal HD) is
>>>>>>2.0.5.
>>>>>> So from what you're saying I cannot reuse it for my Samza
>>>>>>deployment.
>>>>>> 
>>>>>> So then my option is:
>>>>>> 1. Reuse zookeeper. So I'll have to configure Samza to point to the
>>>>>> right cluster 2. Run Samza with its YARN grid and Kafka Installation
>>>>>> (I can do this on multiple servers right? 1 RM, 2 NM kind of
>>>>>> situation)
>>>>>> 
>>>>>> Thanks,
>>>>>> Sonali
>>>>>> 
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Chris Riccomini [mailto:[email protected]]
>>>>>> Sent: Friday, January 31, 2014 11:24 AM
>>>>>> To: [email protected]
>>>>>> Subject: Re: Cluster Installation
>>>>>> 
>>>>>> Hey Sonali,
>>>>>> 
>>>>>> Everything Gary said is correct.
>>>>>> 
>>>>>> One other item of note is that if you're interested in running stuff
>>>>>> locally in a dev-mode fashion, you don't need YARN. You can use the
>>>>>> LocalJobFactory instead of the YarnJobFactory factory when
>>>>>>configuring
>>>>>> your job's "job.factory.class" setting.
>>>>>> 
>>>>>> For "real" deployments, yes you'll need YARN, ZooKeeper, and Kafka.
>>>>>> They can be deployed using any standard way of shipping software
>>>>>> around to a cluster of machines.
>>>>>> 
>>>>>> Cheers,
>>>>>> Chris
>>>>>> 
>>>>>> On 1/31/14 12:58 AM, "Garry Turkington"
>>>>>> <[email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Sonali,
>>>>>>> 
>>>>>>> This was something that I had some questions about originally as
>>>>>>>well.
>>>>>>> In terms of required components then yes, for any size of Samza
>>>>>>> deployment you will  need all those pieces.
>>>>>>> 
>>>>>>> In terms of actual deployment, from what I understand from the
>>>>>>> LinkedIn guys they do run Samza on a dedicated YARN grid that also
>>>>>>> has a Kafka broker collocated on each node. These decisions though
>>>>>>> appear to be more down to convenience than a hard requirement.
>>>>>>> 
>>>>>>> In my own setup I have existing ZooKeeper and Kafka clusters that
>>>>>>>I'm
>>>>>>> pointing Samza at but do need to run a dedicated YARN grid because
>>>>>>>my
>>>>>>> Hadoop cluster has a pre-2.2 version of YARN running on it.
>>>>>>> 
>>>>>>> So if you have existing components you can reuse them, if not then
>>>>>>> repurposing the Hello Samza package is a good starting point to get
>>>>>>> all the things you want on the required hosts. Only caveat would be
>>>>>>> to not drop a ZK node on each host, the ZK quorum should follow the
>>>>>>> usual advice of an odd number of servers and likely no more than 3,
>>>>>>>5
>>>>>>> or 7 depending on your deployment size.
>>>>>>> 
>>>>>>> Garry
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: [email protected]
>>>>>>> [mailto:[email protected]]
>>>>>>> Sent: 30 January 2014 23:38
>>>>>>> To: [email protected]
>>>>>>> Subject: Cluster Installation
>>>>>>> 
>>>>>>> Hi All,
>>>>>>> 
>>>>>>> I'm new to working with Samza and have been trying to figure out
>>>>>>>the
>>>>>>> best cluster configuration. I understand that Samza comes with
>>>>>>> yarn,kafka and zookeeper out of the box. Is that the model just for
>>>>>>>a
>>>>>>> standalone/local configuration. What if I want a bigger cluster? Do
>>>>>>>I
>>>>>>> have to install yarn, kafka and zookeeper separately? Any
>>>>>>>suggestions
>>>>>>> would be great!
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Sonali
>>>>>>> 
>>>>>>> Sonali Parthasarathy
>>>>>>> R&D Developer, Data Insights
>>>>>>> Accenture Technology Labs
>>>>>>> 703-341-7432
>>>>>>> 
>>>>>>> 
>>>>>>> ________________________________
>>>>>>> 
>>>>>>> This message is for the designated recipient only and may contain
>>>>>>> privileged, proprietary, or otherwise confidential information. If
>>>>>>> you have received it in error, please notify the sender immediately
>>>>>>> and delete the original. Any other use of the e-mail by you is
>>>>>>> prohibited.
>>>>>>> Where allowed by local law, electronic communications with
>>>>>>>Accenture
>>>>>>> and its affiliates, including e-mail and instant messaging
>>>>>>>(including
>>>>>>> content), may be scanned by our systems for the purposes of
>>>>>>> information security and assessment of internal compliance with
>>>>>>> Accenture policy. .
>>>>>>> 
>>>>>>>____________________________________________________________________
>>>>>>>_
>>>>>>> _
>>>>>>> _
>>>>>>> ___
>>>>>>> ____________
>>>>>>> 
>>>>>>> www.accenture.com
>>>>>>> 
>>>>>>> -----
>>>>>>> No virus found in this message.
>>>>>>> Checked by AVG - www.avg.com
>>>>>>> Version: 2014.0.4259 / Virus Database: 3684/7046 - Release Date:
>>>>>>> 01/30/14
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ________________________________
>>>>>> 
>>>>>> This message is for the designated recipient only and may contain
>>>>>> privileged, proprietary, or otherwise confidential information. If
>>>>>>you
>>>>>> have received it in error, please notify the sender immediately and
>>>>>> delete the original. Any other use of the e-mail by you is
>>>>>>prohibited.
>>>>>> Where allowed by local law, electronic communications with Accenture
>>>>>> and its affiliates, including e-mail and instant messaging
>>>>>>(including
>>>>>> content), may be scanned by our systems for the purposes of
>>>>>> information security and assessment of internal compliance with
>>>>>> Accenture policy. .
>>>>>> 
>>>>>>_____________________________________________________________________
>>>>>>_
>>>>>> _
>>>>>> ___
>>>>>> ____________
>>>>>> 
>>>>>> www.accenture.com
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> ________________________________
>>>>> 
>>>>> This message is for the designated recipient only and may contain
>>>>> privileged, proprietary, or otherwise confidential information. If
>>>>>you
>>>>> have received it in error, please notify the sender immediately and
>>>>> delete the original. Any other use of the e-mail by you is
>>>>>prohibited.
>>>>> Where allowed by local law, electronic communications with Accenture
>>>>> and its affiliates, including e-mail and instant messaging (including
>>>>> content), may be scanned by our systems for the purposes of
>>>>>information
>>>>> security and assessment of internal compliance with Accenture policy.
>>>>>.
>>>>> 
>>>>>______________________________________________________________________
>>>>>_
>>>>> ___
>>>>> ____________
>>>>> 
>>>>> www.accenture.com
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> ________________________________
>>>> 
>>>> This message is for the designated recipient only and may contain
>>>> privileged, proprietary, or otherwise confidential information. If you
>>>> have received it in error, please notify the sender immediately and
>>>> delete the original. Any other use of the e-mail by you is prohibited.
>>>> Where allowed by local law, electronic communications with Accenture
>>>>and
>>>> its affiliates, including e-mail and instant messaging (including
>>>> content), may be scanned by our systems for the purposes of
>>>>information
>>>> security and assessment of internal compliance with Accenture policy.
>>>>.
>>>> 
>>>>_______________________________________________________________________
>>>>_
>>>>__
>>>> ____________
>>>> 
>>>> www.accenture.com
>>>> 
>>> 
>>
>

Re: Cluster Installation

Reply via email to