Is there a different type of YARN HA? It seems the method of HA for CDH5 uses the qjournal on top of the zkfc.
-Dan On Wed, Mar 19, 2014 at 10:53 AM, Yan Fang <[email protected]> wrote: > Hi Chris, > > I have made the Samza run in HA yarn, leveraging the high available > configuration. Just put my coarse approach here in case someone faces the > similar problem. > > The HA yarn is from CDH5-beta 2 version, which is ZK-based HA yarn. It > seems not working by just replacing the jar file. So the way I made it work > is a little hacky: changed the samza-yarn a little, having the client check > the current active RM from Zookeeper every time it submits AM. ( Because HA > yarn keeps the active RM name in the ZK ). Of course, Samza works well. It > will automatically get restarted when the RM changes (that is, standby RM > becomes active when active RM fails). > > Hope someone has a better idea for doing this. Thank you. > > Cheers, > > Fang, Yan > [email protected] > +1 (206) 849-4108 > > > On Mon, Mar 10, 2014 at 4:35 PM, Yan Fang <[email protected]> wrote: > > > Hi Chris, > > > > Thank you! You are correct, I am actually working in a CDH5-beta version. > > Will definitely try as you recommended and do some experiments to see how > > Samza performances. > > > > Cheers, > > > > Fang, Yan > > [email protected] > > +1 (206) 849-4108 > > > > > > On Mon, Mar 10, 2014 at 3:54 PM, Chris Riccomini < > [email protected]>wrote: > > > >> Hey Yan, > >> > >> I'm not aware of anyone successfully running Samza with CDH5's HA YARN. > As > >> far as I understand, those patches are not fully merged in to Apache yet > >> (I could be wrong, though). > >> > >> At a minimum, you'll probably need to replace Samza's 2.2 YARN jars with > >> the CDH5 jars, so that Samza properly interprets the different configs > >> (e.g. The new RM style of config, which you've mentioned). > >> > >> I'm not sure how Samza's YARN AM will behave when the RM is failed over. > >> You'll have to experiment with this and see. If you find anything out, > >> it'd be very very useful if you could share it with the rest of us. > Samza > >> and HA RMs is something that we're investigating as well. > >> > >> Cheers, > >> Chris > >> > >> On 3/10/14 12:11 PM, "Yan Fang" <[email protected]> wrote: > >> > >> >Hi All, > >> > > >> >Happy daylight saving! I am wondering if anyone in this mailing-list > has > >> >successfully run the Samza in a HA YARN cluster ? > >> > > >> >We are trying to run Samza in CDH5 which has HA YARN configurations. I > am > >> >able to run Samza only by updating the yarn-default.xml (change > >> >yarn.resourcemanager.address), the same approach Nirmal Kumar mentioned > >> in > >> >"Running Samza on multi node". Otherwise, it will always connect to > >> >0.0.0.0 > >> >in yarn-default.xml. (I am sure I set the conf file and YARN_HOME > >> >correctly.) > >> > > >> >So my question is: > >> >1. Can't Samza interpret HA YARN configuration file correctly? ( Is > that > >> >because the HA YARN configuration is using, say, > >> >yarn.resourcemanager.address.*rm15* instead of > >> >yarn.resourcemanager.address > >> >?) > >> > > >> >2. Is it possible to switch to a new RM automatically when one is down? > >> >Because we have two RMs, one for Active and one for Standby but I can > >> only > >> >put one RM address in yarn-deault.xml. I am wondering if it is possible > >> to > >> >detect the active RM automatically in Samza (or other method)? > >> > > >> >3. Any one has the luck to leverage the HA YARN? > >> > > >> >Thank you. > >> > > >> >Cheers, > >> > > >> >Fang, Yan > >> >[email protected] > >> >+1 (206) 849-4108 > >> > > >> > > >> >On Fri, Feb 21, 2014 at 3:23 PM, Chris Riccomini > >> ><[email protected]>wrote: > >> > > >> >> Hey Ethan, > >> >> > >> >> YARN's HA support is marginal right now, and we're still > investigating > >> >> this stuff. Some useful things to read are: > >> >> > >> >> * https://issues.apache.org/jira/browse/YARN-128 > >> >> * https://issues.apache.org/jira/browse/YARN-149 > >> >> * https://issues.apache.org/jira/browse/YARN-353 > >> >> * https://issues.apache.org/jira/browse/YARN-556 > >> >> > >> >> > >> >> Also, CDH seems to be packaging some of the ZK-based HA stuff > already: > >> >> > >> >> > >> >> > >> >> > >> > https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/late > >> >>st > >> >> /CDH5-High-Availability-Guide/cdh5hag_cfg_RM_HA.html > >> >> > >> >> > >> >> At LI, we're still experimenting with the best setup, so my guidance > >> >>might > >> >> not be state of the art. We currently configure the YARN RM's store > >> >> (yarn.resourcemanager.store.class) to use the file system store > >> >> > >> > >> > >>(org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMState > >> >>St > >> >> ore). The failover is a manual operation where we copy the RM state > to > >> a > >> >> new machine, and then start the RM on that machine. You then need to > >> >>front > >> >> the RM with a VIP or DNS entry, which you can update to point to the > >> new > >> >> RM machine when a failover occurs. The NMs need to be configured to > >> >>point > >> >> to this VIP/DNS entry, so that when a failover occurs, the NMs don't > >> >>need > >> >> to update their yarn-site.xml files. > >> >> > >> >> > >> >> It sounds like in the future you won't need to use VIPs/DNS entries. > >> You > >> >> should probably also email the YARN mailing list, just in case we're > >> >> misinformed or unaware of some new updates. > >> >> > >> >> Cheers, > >> >> Chris > >> >> > >> >> On 2/21/14 2:27 PM, "Ethan Setnik" <[email protected]> > >> wrote: > >> >> > >> >> >I'm looking to deploy Samza on AWS infrastructure in a HA > >> >>configuration. > >> >> >I > >> >> >have a clear picture of how to configure all the components such > that > >> >>they > >> >> >do not contain any single point of failure. > >> >> > > >> >> >I'm stuck, however, when it comes to the YARN architecture. It > seems > >> >>that > >> >> >YARN relies on the single-master / multi-slave pattern as described > in > >> >>the > >> >> >YARN documentation. This introduces a single point of failure at > the > >> >> >ResourceManager level such that a failed ResourceManager will fail > the > >> >> >entire YARN cluster. How does LinkedIn architect a HA configuration > >> >>for > >> >> >Samza on YARN such that a complete instance failure of > ResourceManager > >> >> >provides failover for the YARN cluster? > >> >> > > >> >> >Thanks for your help. > >> >> > > >> >> >Best, > >> >> >Ethan > >> >> > > >> >> > > >> >> >-- > >> >> >Ethan Setnik > >> >> >MobileAware > >> >> > > >> >> >m: +1 617 513 2052 > >> >> >e: [email protected] > >> >> > >> >> > >> > >> > > > -- Dan Di Spaltro
