Re: Samza Highly Available YARN Configuration

Yan Fang Mon, 10 Mar 2014 16:36:26 -0700

Hi Chris,

Thank you! You are correct, I am actually working in a CDH5-beta version.
Will definitely try as you recommended and do some experiments to see how
Samza performances.


Cheers,

Fang, Yan
[email protected]
+1 (206) 849-4108


On Mon, Mar 10, 2014 at 3:54 PM, Chris Riccomini <[email protected]>wrote:

> Hey Yan,
>
> I'm not aware of anyone successfully running Samza with CDH5's HA YARN. As
> far as I understand, those patches are not fully merged in to Apache yet
> (I could be wrong, though).
>
> At a minimum, you'll probably need to replace Samza's 2.2 YARN jars with
> the CDH5 jars, so that Samza properly interprets the different configs
> (e.g. The new RM style of config, which you've mentioned).
>
> I'm not sure how Samza's YARN AM will behave when the RM is failed over.
> You'll have to experiment with this and see. If you find anything out,
> it'd be very very useful if you could share it with the rest of us. Samza
> and HA RMs is something that we're investigating as well.
>
> Cheers,
> Chris
>
> On 3/10/14 12:11 PM, "Yan Fang" <[email protected]> wrote:
>
> >Hi All,
> >
> >Happy daylight saving! I am wondering if anyone in this mailing-list has
> >successfully run the Samza in a HA YARN cluster ?
> >
> >We are trying to run Samza in CDH5 which has HA YARN configurations. I am
> >able to run Samza only by updating the yarn-default.xml (change
> >yarn.resourcemanager.address), the same approach Nirmal Kumar mentioned in
> >"Running Samza on multi node". Otherwise, it will always connect to
> >0.0.0.0
> >in yarn-default.xml. (I am sure I set the conf file and YARN_HOME
> >correctly.)
> >
> >So my question is:
> >1. Can't Samza interpret HA YARN configuration file correctly? ( Is that
> >because the HA YARN configuration is using, say,
> >yarn.resourcemanager.address.*rm15* instead of
> >yarn.resourcemanager.address
> >?)
> >
> >2. Is it possible to switch to a new RM automatically when one is down?
> >Because we have two RMs, one for Active and one for Standby but I can only
> >put one RM address in yarn-deault.xml. I am wondering if it is possible to
> >detect the active RM automatically in Samza (or other method)?
> >
> >3. Any one has the luck to leverage the HA YARN?
> >
> >Thank you.
> >
> >Cheers,
> >
> >Fang, Yan
> >[email protected]
> >+1 (206) 849-4108
> >
> >
> >On Fri, Feb 21, 2014 at 3:23 PM, Chris Riccomini
> ><[email protected]>wrote:
> >
> >> Hey Ethan,
> >>
> >> YARN's HA support is marginal right now, and we're still investigating
> >> this stuff. Some useful things to read are:
> >>
> >> * https://issues.apache.org/jira/browse/YARN-128
> >> * https://issues.apache.org/jira/browse/YARN-149
> >> * https://issues.apache.org/jira/browse/YARN-353
> >> * https://issues.apache.org/jira/browse/YARN-556
> >>
> >>
> >> Also, CDH seems to be packaging some of the ZK-based HA stuff already:
> >>
> >>
> >>
> >>
> https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/late
> >>st
> >> /CDH5-High-Availability-Guide/cdh5hag_cfg_RM_HA.html
> >>
> >>
> >> At LI, we're still experimenting with the best setup, so my guidance
> >>might
> >> not be state of the art. We currently configure the YARN RM's store
> >> (yarn.resourcemanager.store.class) to use the file system store
> >>
> >>(org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMState
> >>St
> >> ore). The failover is a manual operation where we copy the RM state to a
> >> new machine, and then start the RM on that machine. You then need to
> >>front
> >> the RM with a VIP or DNS entry, which you can update to point to the new
> >> RM machine when a failover occurs. The NMs need to be configured to
> >>point
> >> to this VIP/DNS entry, so that when a failover occurs, the NMs don't
> >>need
> >> to update their yarn-site.xml files.
> >>
> >>
> >> It sounds like in the future you won't need to use VIPs/DNS entries. You
> >> should probably also email the YARN mailing list, just in case we're
> >> misinformed or unaware of some new updates.
> >>
> >> Cheers,
> >> Chris
> >>
> >> On 2/21/14 2:27 PM, "Ethan Setnik" <[email protected]>
> wrote:
> >>
> >> >I'm looking to deploy Samza on AWS infrastructure in a HA
> >>configuration.
> >> >I
> >> >have a clear picture of how to configure all the components such that
> >>they
> >> >do not contain any single point of failure.
> >> >
> >> >I'm stuck, however, when it comes to the YARN architecture.  It seems
> >>that
> >> >YARN relies on the single-master / multi-slave pattern as described in
> >>the
> >> >YARN documentation.  This introduces a single point of failure at the
> >> >ResourceManager level such that a failed ResourceManager will fail the
> >> >entire YARN cluster.  How does LinkedIn architect a HA configuration
> >>for
> >> >Samza on YARN such that a complete instance failure of ResourceManager
> >> >provides failover for the YARN cluster?
> >> >
> >> >Thanks for your help.
> >> >
> >> >Best,
> >> >Ethan
> >> >
> >> >
> >> >--
> >> >Ethan Setnik
> >> >MobileAware
> >> >
> >> >m: +1 617 513 2052
> >> >e: [email protected]
> >>
> >>
>
>

Re: Samza Highly Available YARN Configuration

Reply via email to