Thanks Arun and Subru for working on this! +1 (non-binding) pending YARN-7453.
1) Setup RM HA 2) Verified leveldb/zookeeper scheduler configuration API works via REST/CLI 3) Verified configuration changes persist across restart 4) yarn rmadmin -refreshQueues works when scheduler configuration API disabled (and vice-versa) Jonathan Hung On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <ebad...@oath.com> wrote: > +1 (non-binding) pending the issue that Sunil/Rohith pointed out > > - Verified all hashes and checksums > - Built from source on macOS 10.12.6, Java 1.8.0u65 > - Deployed a pseudo cluster > - Ran some example jobs > > Thanks, > > Eric > > On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wheele...@gmail.com> wrote: > >> Sunil / Rohith, >> >> Could you check if your configs are same as Jonathan posted configs? >> https://issues.apache.org/jira/browse/YARN-7453?focusedComme >> ntId=16242693&page=com.atlassian.jira.plugin.system. >> issuetabpanels:comment-tabpanel#comment-16242693 >> >> And could you try if using Jonathan's configs can still reproduce the >> issue? >> >> Thanks, >> Wangda >> >> >> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <asur...@apache.org> wrote: >> >> > Thanks for testing Rohith and Sunil >> > >> > Can you please confirm if it is not a config issue at your end ? >> > We (both Jonathan and myself) just tried testing this on a fresh cluster >> > (both automatic and manual) and we are not able to reproduce this. I've >> > updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453> >> > JIRA >> > with details of testing. >> > >> > Cheers >> > -Arun/Subru >> > >> > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S < >> > rohithsharm...@apache.org >> > > wrote: >> > >> > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453 >> > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this >> > > issue. >> > > >> > > - Rohith Sharma K S >> > > >> > > On 7 November 2017 at 16:44, Sunil G <sun...@apache.org> wrote: >> > > >> > >> Hi Subru and Arun. >> > >> >> > >> Thanks for driving 2.9 release. Great work! >> > >> >> > >> I installed cluster built from source. >> > >> - Ran few MR jobs with application priority enabled. Runs fine. >> > >> - Accessed new UI and it also seems fine. >> > >> >> > >> However I am also getting same issue as Rohith reported. >> > >> - Started an HA cluster >> > >> - Pushed RM to standby >> > >> - Pushed back RM to active then seeing an exception. >> > >> >> > >> org.apache.hadoop.ha.ServiceFailedException: RM could not >> transition to >> > >> Active >> > >> at >> > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE >> > >> lectorBasedElectorServic >> > >> e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) >> > >> at >> > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ >> > >> eStandbyElector.java:894 >> > >> ) >> > >> >> > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException: >> > >> KeeperErrorCode = NoAuth >> > >> at >> > >> org.apache.zookeeper.KeeperException.create(KeeperException. >> java:113) >> > >> at org.apache.zookeeper.ZooKeeper >> .multiInternal(ZooKeeper.java: >> > >> 949) >> > >> >> > >> Will check and post more details, >> > >> >> > >> - Sunil >> > >> >> > >> >> > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S < >> > >> rohithsharm...@apache.org> >> > >> wrote: >> > >> >> > >> > Thanks Subru/Arun for the great work! >> > >> > >> > >> > Downloaded source and built from it. Deployed RM HA non-secured >> > cluster >> > >> > along with new YARN UI and ATSv2. >> > >> > >> > >> > I am facing basic RM HA switch issue after first time successful >> > start. >> > >> > *Can >> > >> > anyone else is facing this issue?* >> > >> > >> > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never >> switch >> > to >> > >> > active successfully. Exception trace I see from the log is >> > >> > >> > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha. >> > ActiveStandbyElector: >> > >> > Exception handling the winning of election >> > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not >> transition >> > to >> > >> > Active >> > >> > at >> > >> > >> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE >> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec >> > >> torBasedElectorService.java:146) >> > >> > at >> > >> > >> > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ >> > >> eStandbyElector.java:894) >> > >> > at >> > >> > >> > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti >> > >> veStandbyElector.java:473) >> > >> > at >> > >> > >> > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent( >> > >> ClientCnxn.java:599) >> > >> > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn. >> > >> java:498) >> > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when >> > >> > transitioning to Active mode >> > >> > at >> > >> > >> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t >> > >> ransitionToActive(AdminService.java:325) >> > >> > at >> > >> > >> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE >> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec >> > >> torBasedElectorService.java:144) >> > >> > ... 4 more >> > >> > Caused by: org.apache.hadoop.service.ServiceStateException: >> > >> > org.apache.zookeeper.KeeperException$NoAuthException: >> > KeeperErrorCode = >> > >> > NoAuth >> > >> > at >> > >> > >> > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv >> > >> iceStateException.java:105) >> > >> > at >> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ >> > >> ice.java:205) >> > >> > at >> > >> > >> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >> > >> r.startActiveServices(ResourceManager.java:1131) >> > >> > at >> > >> > >> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >> > >> r$1.run(ResourceManager.java:1171) >> > >> > at >> > >> > >> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >> > >> r$1.run(ResourceManager.java:1167) >> > >> > at java.security.AccessController.doPrivileged(Native Method) >> > >> > at javax.security.auth.Subject.doAs(Subject.java:422) >> > >> > at >> > >> > >> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro >> > >> upInformation.java:1886) >> > >> > at >> > >> > >> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >> > >> r.transitionToActive(ResourceManager.java:1167) >> > >> > at >> > >> > >> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t >> > >> ransitionToActive(AdminService.java:320) >> > >> > ... 5 more >> > >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException: >> > >> > KeeperErrorCode = NoAuth >> > >> > at >> > >> > org.apache.zookeeper.KeeperException.create(KeeperException. >> java:113) >> > >> > at org.apache.zookeeper.ZooKeeper.multiInternal( >> > ZooKeeper.java:949) >> > >> > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915) >> > >> > at >> > >> > >> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO >> > >> peration(CuratorTransactionImpl.java:159) >> > >> > at >> > >> > >> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc >> > >> ess$200(CuratorTransactionImpl.java:44) >> > >> > at >> > >> > >> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c >> > >> all(CuratorTransactionImpl.java:129) >> > >> > at >> > >> > >> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c >> > >> all(CuratorTransactionImpl.java:125) >> > >> > at org.apache.curator.RetryLoop.c >> allWithRetry(RetryLoop.java:107) >> > >> > at >> > >> > >> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com >> > >> mit(CuratorTransactionImpl.java:122) >> > >> > at >> > >> > >> > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact >> > >> ion.commit(ZKCuratorManager.java:403) >> > >> > at >> > >> > >> > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData( >> > >> ZKCuratorManager.java:372) >> > >> > at >> > >> > >> > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS >> > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493) >> > >> > at >> > >> > >> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >> > >> r$RMActiveServices.serviceStart(ResourceManager.java:754) >> > >> > at >> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ >> > >> ice.java:194) >> > >> > ... 13 more >> > >> > >> > >> > Thanks & Regards >> > >> > Rohith Sharma K S >> > >> > >> > >> > On 4 November 2017 at 04:20, Arun Suresh <asur...@apache.org> >> wrote: >> > >> > >> > >> > > Hi folks, >> > >> > > >> > >> > > Apache Hadoop 2.9.0 is the first stable release of Hadoop >> 2.9 >> > >> line >> > >> > and >> > >> > > will be the latest stable/production release for Apache Hadoop - >> it >> > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements, >> 787 >> > Bug >> > >> > > fixes new fixed issues since 2.8.2 . >> > >> > > >> > >> > > More information about the 2.9.0 release plan can be found >> > here: >> > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/ >> > >> > > Roadmap#Roadmap-Version2.9 >> > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/ >> > >> > > Roadmap#Roadmap-Version2.9>* >> > >> > > >> > >> > > New RC is available at: >> > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/ >> > >> > > >> > >> > > The RC tag in git is: release-2.9.0-RC0, and the latest >> commit >> > >> id >> > >> > is: >> > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a >> > >> > > >> > >> > > The maven artifacts are available via >> repository.apache.org >> > at: >> > >> > > * >> > >> > https://repository.apache.org/content/repositories/orgapache >> > >> hadoop-1065/ >> > >> > > < >> > >> > https://repository.apache.org/content/repositories/orgapache >> > >> hadoop-1065/ >> > >> > > >* >> > >> > > >> > >> > > Please try the release and vote; the vote will run for the >> > >> usual 5 >> > >> > > days, ending on 11/10/2017 4pm PST time. >> > >> > > >> > >> > > Thanks, >> > >> > > >> > >> > > Arun/Subru >> > >> > > >> > >> > >> > >> >> > > >> > > >> > >> > >