Is there a different type of YARN HA?  It seems the method of HA for CDH5
uses the qjournal on top of the zkfc.

-Dan


On Wed, Mar 19, 2014 at 10:53 AM, Yan Fang <[email protected]> wrote:

> Hi Chris,
>
> I have made the Samza run in HA yarn, leveraging the high available
> configuration. Just put my coarse approach here in case someone faces the
> similar problem.
>
> The HA yarn is from CDH5-beta 2 version, which is ZK-based HA yarn. It
> seems not working by just replacing the jar file. So the way I made it work
> is a little hacky: changed the samza-yarn a little, having the client check
> the current active RM from Zookeeper every time it submits AM. ( Because HA
> yarn keeps the active RM name in the ZK ). Of course, Samza works well. It
> will automatically get restarted when the RM changes (that is, standby RM
> becomes active when active RM fails).
>
> Hope someone has a better idea for doing this. Thank you.
>
> Cheers,
>
> Fang, Yan
> [email protected]
> +1 (206) 849-4108
>
>
> On Mon, Mar 10, 2014 at 4:35 PM, Yan Fang <[email protected]> wrote:
>
> > Hi Chris,
> >
> > Thank you! You are correct, I am actually working in a CDH5-beta version.
> > Will definitely try as you recommended and do some experiments to see how
> > Samza performances.
> >
> > Cheers,
> >
> > Fang, Yan
> > [email protected]
> > +1 (206) 849-4108
> >
> >
> > On Mon, Mar 10, 2014 at 3:54 PM, Chris Riccomini <
> [email protected]>wrote:
> >
> >> Hey Yan,
> >>
> >> I'm not aware of anyone successfully running Samza with CDH5's HA YARN.
> As
> >> far as I understand, those patches are not fully merged in to Apache yet
> >> (I could be wrong, though).
> >>
> >> At a minimum, you'll probably need to replace Samza's 2.2 YARN jars with
> >> the CDH5 jars, so that Samza properly interprets the different configs
> >> (e.g. The new RM style of config, which you've mentioned).
> >>
> >> I'm not sure how Samza's YARN AM will behave when the RM is failed over.
> >> You'll have to experiment with this and see. If you find anything out,
> >> it'd be very very useful if you could share it with the rest of us.
> Samza
> >> and HA RMs is something that we're investigating as well.
> >>
> >> Cheers,
> >> Chris
> >>
> >> On 3/10/14 12:11 PM, "Yan Fang" <[email protected]> wrote:
> >>
> >> >Hi All,
> >> >
> >> >Happy daylight saving! I am wondering if anyone in this mailing-list
> has
> >> >successfully run the Samza in a HA YARN cluster ?
> >> >
> >> >We are trying to run Samza in CDH5 which has HA YARN configurations. I
> am
> >> >able to run Samza only by updating the yarn-default.xml (change
> >> >yarn.resourcemanager.address), the same approach Nirmal Kumar mentioned
> >> in
> >> >"Running Samza on multi node". Otherwise, it will always connect to
> >> >0.0.0.0
> >> >in yarn-default.xml. (I am sure I set the conf file and YARN_HOME
> >> >correctly.)
> >> >
> >> >So my question is:
> >> >1. Can't Samza interpret HA YARN configuration file correctly? ( Is
> that
> >> >because the HA YARN configuration is using, say,
> >> >yarn.resourcemanager.address.*rm15* instead of
> >> >yarn.resourcemanager.address
> >> >?)
> >> >
> >> >2. Is it possible to switch to a new RM automatically when one is down?
> >> >Because we have two RMs, one for Active and one for Standby but I can
> >> only
> >> >put one RM address in yarn-deault.xml. I am wondering if it is possible
> >> to
> >> >detect the active RM automatically in Samza (or other method)?
> >> >
> >> >3. Any one has the luck to leverage the HA YARN?
> >> >
> >> >Thank you.
> >> >
> >> >Cheers,
> >> >
> >> >Fang, Yan
> >> >[email protected]
> >> >+1 (206) 849-4108
> >> >
> >> >
> >> >On Fri, Feb 21, 2014 at 3:23 PM, Chris Riccomini
> >> ><[email protected]>wrote:
> >> >
> >> >> Hey Ethan,
> >> >>
> >> >> YARN's HA support is marginal right now, and we're still
> investigating
> >> >> this stuff. Some useful things to read are:
> >> >>
> >> >> * https://issues.apache.org/jira/browse/YARN-128
> >> >> * https://issues.apache.org/jira/browse/YARN-149
> >> >> * https://issues.apache.org/jira/browse/YARN-353
> >> >> * https://issues.apache.org/jira/browse/YARN-556
> >> >>
> >> >>
> >> >> Also, CDH seems to be packaging some of the ZK-based HA stuff
> already:
> >> >>
> >> >>
> >> >>
> >> >>
> >>
> https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/late
> >> >>st
> >> >> /CDH5-High-Availability-Guide/cdh5hag_cfg_RM_HA.html
> >> >>
> >> >>
> >> >> At LI, we're still experimenting with the best setup, so my guidance
> >> >>might
> >> >> not be state of the art. We currently configure the YARN RM's store
> >> >> (yarn.resourcemanager.store.class) to use the file system store
> >> >>
> >>
> >>
> >>(org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMState
> >> >>St
> >> >> ore). The failover is a manual operation where we copy the RM state
> to
> >> a
> >> >> new machine, and then start the RM on that machine. You then need to
> >> >>front
> >> >> the RM with a VIP or DNS entry, which you can update to point to the
> >> new
> >> >> RM machine when a failover occurs. The NMs need to be configured to
> >> >>point
> >> >> to this VIP/DNS entry, so that when a failover occurs, the NMs don't
> >> >>need
> >> >> to update their yarn-site.xml files.
> >> >>
> >> >>
> >> >> It sounds like in the future you won't need to use VIPs/DNS entries.
> >> You
> >> >> should probably also email the YARN mailing list, just in case we're
> >> >> misinformed or unaware of some new updates.
> >> >>
> >> >> Cheers,
> >> >> Chris
> >> >>
> >> >> On 2/21/14 2:27 PM, "Ethan Setnik" <[email protected]>
> >> wrote:
> >> >>
> >> >> >I'm looking to deploy Samza on AWS infrastructure in a HA
> >> >>configuration.
> >> >> >I
> >> >> >have a clear picture of how to configure all the components such
> that
> >> >>they
> >> >> >do not contain any single point of failure.
> >> >> >
> >> >> >I'm stuck, however, when it comes to the YARN architecture.  It
> seems
> >> >>that
> >> >> >YARN relies on the single-master / multi-slave pattern as described
> in
> >> >>the
> >> >> >YARN documentation.  This introduces a single point of failure at
> the
> >> >> >ResourceManager level such that a failed ResourceManager will fail
> the
> >> >> >entire YARN cluster.  How does LinkedIn architect a HA configuration
> >> >>for
> >> >> >Samza on YARN such that a complete instance failure of
> ResourceManager
> >> >> >provides failover for the YARN cluster?
> >> >> >
> >> >> >Thanks for your help.
> >> >> >
> >> >> >Best,
> >> >> >Ethan
> >> >> >
> >> >> >
> >> >> >--
> >> >> >Ethan Setnik
> >> >> >MobileAware
> >> >> >
> >> >> >m: +1 617 513 2052
> >> >> >e: [email protected]
> >> >>
> >> >>
> >>
> >>
> >
>



-- 
Dan Di Spaltro

Reply via email to