Thanks Arun and Subru for working on this!

+1 (non-binding) pending YARN-7453.

1) Setup RM HA
2) Verified leveldb/zookeeper scheduler configuration API works via REST/CLI
3) Verified configuration changes persist across restart
4) yarn rmadmin -refreshQueues works when scheduler configuration API
disabled (and vice-versa)


Jonathan Hung

On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <ebad...@oath.com> wrote:

> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>
> - Verified all hashes and checksums
> - Built from source on macOS 10.12.6, Java 1.8.0u65
> - Deployed a pseudo cluster
> - Ran some example jobs
>
> Thanks,
>
> Eric
>
> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wheele...@gmail.com> wrote:
>
>> Sunil / Rohith,
>>
>> Could you check if your configs are same as Jonathan posted configs?
>> https://issues.apache.org/jira/browse/YARN-7453?focusedComme
>> ntId=16242693&page=com.atlassian.jira.plugin.system.
>> issuetabpanels:comment-tabpanel#comment-16242693
>>
>> And could you try if using Jonathan's configs can still reproduce the
>> issue?
>>
>> Thanks,
>> Wangda
>>
>>
>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <asur...@apache.org> wrote:
>>
>> > Thanks for testing Rohith and Sunil
>> >
>> > Can you please confirm if it is not a config issue at your end ?
>> > We (both Jonathan and myself) just tried testing this on a fresh cluster
>> > (both automatic and manual) and we are not able to reproduce this. I've
>> > updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453>
>> > JIRA
>> > with details of testing.
>> >
>> > Cheers
>> > -Arun/Subru
>> >
>> > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>> > rohithsharm...@apache.org
>> > > wrote:
>> >
>> > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>> > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
>> > > issue.
>> > >
>> > > - Rohith Sharma K S
>> > >
>> > > On 7 November 2017 at 16:44, Sunil G <sun...@apache.org> wrote:
>> > >
>> > >> Hi Subru and Arun.
>> > >>
>> > >> Thanks for driving 2.9 release. Great work!
>> > >>
>> > >> I installed cluster built from source.
>> > >> - Ran few MR jobs with application priority enabled. Runs fine.
>> > >> - Accessed new UI and it also seems fine.
>> > >>
>> > >> However I am also getting same issue as Rohith reported.
>> > >> - Started an HA cluster
>> > >> - Pushed RM to standby
>> > >> - Pushed back RM to active then seeing an exception.
>> > >>
>> > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
>> transition to
>> > >> Active
>> > >>         at
>> > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorServic
>> > >>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>> > >>         at
>> > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> > >> eStandbyElector.java:894
>> > >>     )
>> > >>
>> > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > >> KeeperErrorCode = NoAuth
>> > >>         at
>> > >> org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:113)
>> > >>         at org.apache.zookeeper.ZooKeeper
>> .multiInternal(ZooKeeper.java:
>> > >> 949)
>> > >>
>> > >> Will check and post more details,
>> > >>
>> > >> - Sunil
>> > >>
>> > >>
>> > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>> > >> rohithsharm...@apache.org>
>> > >> wrote:
>> > >>
>> > >> > Thanks Subru/Arun for the great work!
>> > >> >
>> > >> > Downloaded source and built from it. Deployed RM HA non-secured
>> > cluster
>> > >> > along with new YARN UI and ATSv2.
>> > >> >
>> > >> > I am facing basic RM HA switch issue after first time successful
>> > start.
>> > >> > *Can
>> > >> > anyone else is facing this issue?*
>> > >> >
>> > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>> switch
>> > to
>> > >> > active successfully. Exception trace I see from the log is
>> > >> >
>> > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>> > ActiveStandbyElector:
>> > >> > Exception handling the winning of election
>> > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
>> transition
>> > to
>> > >> > Active
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> > >> torBasedElectorService.java:146)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> > >> eStandbyElector.java:894)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>> > >> veStandbyElector.java:473)
>> > >> >     at
>> > >> >
>> > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>> > >> ClientCnxn.java:599)
>> > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
>> > >> java:498)
>> > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
>> > >> > transitioning to Active mode
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> > >> ransitionToActive(AdminService.java:325)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> > >> torBasedElectorService.java:144)
>> > >> >     ... 4 more
>> > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
>> > >> > org.apache.zookeeper.KeeperException$NoAuthException:
>> > KeeperErrorCode =
>> > >> > NoAuth
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
>> > >> iceStateException.java:105)
>> > >> >     at
>> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> > >> ice.java:205)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r.startActiveServices(ResourceManager.java:1131)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r$1.run(ResourceManager.java:1171)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r$1.run(ResourceManager.java:1167)
>> > >> >     at java.security.AccessController.doPrivileged(Native Method)
>> > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> > >> upInformation.java:1886)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r.transitionToActive(ResourceManager.java:1167)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> > >> ransitionToActive(AdminService.java:320)
>> > >> >     ... 5 more
>> > >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > >> > KeeperErrorCode = NoAuth
>> > >> >     at
>> > >> > org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:113)
>> > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
>> > ZooKeeper.java:949)
>> > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>> > >> peration(CuratorTransactionImpl.java:159)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>> > >> ess$200(CuratorTransactionImpl.java:44)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> > >> all(CuratorTransactionImpl.java:129)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> > >> all(CuratorTransactionImpl.java:125)
>> > >> >     at org.apache.curator.RetryLoop.c
>> allWithRetry(RetryLoop.java:107)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
>> > >> mit(CuratorTransactionImpl.java:122)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>> > >> ion.commit(ZKCuratorManager.java:403)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>> > >> ZKCuratorManager.java:372)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>> > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>> > >> >     at
>> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> > >> ice.java:194)
>> > >> >     ... 13 more
>> > >> >
>> > >> > Thanks & Regards
>> > >> > Rohith Sharma K S
>> > >> >
>> > >> > On 4 November 2017 at 04:20, Arun Suresh <asur...@apache.org>
>> wrote:
>> > >> >
>> > >> > > Hi folks,
>> > >> > >
>> > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop
>> 2.9
>> > >> line
>> > >> > and
>> > >> > > will be the latest stable/production release for Apache Hadoop -
>> it
>> > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements,
>> 787
>> > Bug
>> > >> > > fixes new fixed issues since 2.8.2 .
>> > >> > >
>> > >> > >       More information about the 2.9.0 release plan can be found
>> > here:
>> > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
>> > >> > > Roadmap#Roadmap-Version2.9
>> > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
>> > >> > > Roadmap#Roadmap-Version2.9>*
>> > >> > >
>> > >> > >       New RC is available at:
>> > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>> > >> > >
>> > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
>> commit
>> > >> id
>> > >> > is:
>> > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>> > >> > >
>> > >> > >       The maven artifacts are available via
>> repository.apache.org
>> > at:
>> > >> > > *
>> > >> > https://repository.apache.org/content/repositories/orgapache
>> > >> hadoop-1065/
>> > >> > > <
>> > >> > https://repository.apache.org/content/repositories/orgapache
>> > >> hadoop-1065/
>> > >> > > >*
>> > >> > >
>> > >> > >       Please try the release and vote; the vote will run for the
>> > >> usual 5
>> > >> > > days, ending on 11/10/2017 4pm PST time.
>> > >> > >
>> > >> > > Thanks,
>> > >> > >
>> > >> > > Arun/Subru
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Reply via email to