Hi Chris, Thank you! You are correct, I am actually working in a CDH5-beta version. Will definitely try as you recommended and do some experiments to see how Samza performances.
Cheers, Fang, Yan [email protected] +1 (206) 849-4108 On Mon, Mar 10, 2014 at 3:54 PM, Chris Riccomini <[email protected]>wrote: > Hey Yan, > > I'm not aware of anyone successfully running Samza with CDH5's HA YARN. As > far as I understand, those patches are not fully merged in to Apache yet > (I could be wrong, though). > > At a minimum, you'll probably need to replace Samza's 2.2 YARN jars with > the CDH5 jars, so that Samza properly interprets the different configs > (e.g. The new RM style of config, which you've mentioned). > > I'm not sure how Samza's YARN AM will behave when the RM is failed over. > You'll have to experiment with this and see. If you find anything out, > it'd be very very useful if you could share it with the rest of us. Samza > and HA RMs is something that we're investigating as well. > > Cheers, > Chris > > On 3/10/14 12:11 PM, "Yan Fang" <[email protected]> wrote: > > >Hi All, > > > >Happy daylight saving! I am wondering if anyone in this mailing-list has > >successfully run the Samza in a HA YARN cluster ? > > > >We are trying to run Samza in CDH5 which has HA YARN configurations. I am > >able to run Samza only by updating the yarn-default.xml (change > >yarn.resourcemanager.address), the same approach Nirmal Kumar mentioned in > >"Running Samza on multi node". Otherwise, it will always connect to > >0.0.0.0 > >in yarn-default.xml. (I am sure I set the conf file and YARN_HOME > >correctly.) > > > >So my question is: > >1. Can't Samza interpret HA YARN configuration file correctly? ( Is that > >because the HA YARN configuration is using, say, > >yarn.resourcemanager.address.*rm15* instead of > >yarn.resourcemanager.address > >?) > > > >2. Is it possible to switch to a new RM automatically when one is down? > >Because we have two RMs, one for Active and one for Standby but I can only > >put one RM address in yarn-deault.xml. I am wondering if it is possible to > >detect the active RM automatically in Samza (or other method)? > > > >3. Any one has the luck to leverage the HA YARN? > > > >Thank you. > > > >Cheers, > > > >Fang, Yan > >[email protected] > >+1 (206) 849-4108 > > > > > >On Fri, Feb 21, 2014 at 3:23 PM, Chris Riccomini > ><[email protected]>wrote: > > > >> Hey Ethan, > >> > >> YARN's HA support is marginal right now, and we're still investigating > >> this stuff. Some useful things to read are: > >> > >> * https://issues.apache.org/jira/browse/YARN-128 > >> * https://issues.apache.org/jira/browse/YARN-149 > >> * https://issues.apache.org/jira/browse/YARN-353 > >> * https://issues.apache.org/jira/browse/YARN-556 > >> > >> > >> Also, CDH seems to be packaging some of the ZK-based HA stuff already: > >> > >> > >> > >> > https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/late > >>st > >> /CDH5-High-Availability-Guide/cdh5hag_cfg_RM_HA.html > >> > >> > >> At LI, we're still experimenting with the best setup, so my guidance > >>might > >> not be state of the art. We currently configure the YARN RM's store > >> (yarn.resourcemanager.store.class) to use the file system store > >> > >>(org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMState > >>St > >> ore). The failover is a manual operation where we copy the RM state to a > >> new machine, and then start the RM on that machine. You then need to > >>front > >> the RM with a VIP or DNS entry, which you can update to point to the new > >> RM machine when a failover occurs. The NMs need to be configured to > >>point > >> to this VIP/DNS entry, so that when a failover occurs, the NMs don't > >>need > >> to update their yarn-site.xml files. > >> > >> > >> It sounds like in the future you won't need to use VIPs/DNS entries. You > >> should probably also email the YARN mailing list, just in case we're > >> misinformed or unaware of some new updates. > >> > >> Cheers, > >> Chris > >> > >> On 2/21/14 2:27 PM, "Ethan Setnik" <[email protected]> > wrote: > >> > >> >I'm looking to deploy Samza on AWS infrastructure in a HA > >>configuration. > >> >I > >> >have a clear picture of how to configure all the components such that > >>they > >> >do not contain any single point of failure. > >> > > >> >I'm stuck, however, when it comes to the YARN architecture. It seems > >>that > >> >YARN relies on the single-master / multi-slave pattern as described in > >>the > >> >YARN documentation. This introduces a single point of failure at the > >> >ResourceManager level such that a failed ResourceManager will fail the > >> >entire YARN cluster. How does LinkedIn architect a HA configuration > >>for > >> >Samza on YARN such that a complete instance failure of ResourceManager > >> >provides failover for the YARN cluster? > >> > > >> >Thanks for your help. > >> > > >> >Best, > >> >Ethan > >> > > >> > > >> >-- > >> >Ethan Setnik > >> >MobileAware > >> > > >> >m: +1 617 513 2052 > >> >e: [email protected] > >> > >> > >
