Yes, sure. I'll do another RC for next week.

Thank you all for working on this!

On Thu, Jul 9, 2020 at 8:20 AM Masatake Iwasaki
<iwasak...@oss.nttdata.co.jp> wrote:
>
> Hi Gabor Bota,
>
> I committed the fix of YARN-10347 to branch-3.1.
> I think this should be blocker for 3.1.4.
> Could you cherry-pick it to branch-3.1.4 and cut a new RC?
>
> Thanks,
> Masatake Iwasaki
>
> On 2020/07/08 23:31, Masatake Iwasaki wrote:
> > Thanks Steve and Prabhu for the information.
> >
> > The cause turned out to be locking in CapacityScheduler#reinitialize.
> > I think the method is called after transitioning to active stat if
> > RM-HA is enabled.
> >
> > I filed YARN-10347 and created PR.
> >
> >
> > Masatake Iwasaki
> >
> >
> > On 2020/07/08 16:33, Prabhu Joseph wrote:
> >> Hi Masatake,
> >>
> >>       The thread is waiting for a ReadLock, we need to check what the
> >> other
> >> thread holding WriteLock is blocked on.
> >> Can you get three consecutive complete jstack of ResourceManager
> >> during the
> >> issue.
> >>
> >>>> I got no issue if RM-HA is disabled.
> >> Looks RM is not able to access Zookeeper State Store. Can you check if
> >> there is any connectivity issue between RM and Zookeeper.
> >>
> >> Thanks,
> >> Prabhu Joseph
> >>
> >>
> >> On Mon, Jul 6, 2020 at 2:44 AM Masatake Iwasaki
> >> <iwasak...@oss.nttdata.co.jp>
> >> wrote:
> >>
> >>> Thanks for putting this up, Gabor Bota.
> >>>
> >>> I'm testing the RC2 on 3 node docker cluster with NN-HA and RM-HA
> >>> enabled.
> >>> ResourceManager reproducibly blocks on submitApplication while
> >>> launching
> >>> example MR jobs.
> >>> Does anyone run into the same issue?
> >>>
> >>> The same configuration worked for 3.1.3.
> >>> I got no issue if RM-HA is disabled.
> >>>
> >>>
> >>> "IPC Server handler 1 on default port 8032" #167 daemon prio=5
> >>> os_prio=0
> >>> tid=0x00007fe91821ec50 nid=0x3b9 waiting on condition
> >>> [0x00007fe901bac000]
> >>>      java.lang.Thread.State: WAITING (parking)
> >>>           at sun.misc.Unsafe.park(Native Method)
> >>>           - parking to wait for  <0x0000000085d37a40> (a
> >>> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> >>>           at
> >>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> >>>           at
> >>>
> >>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> >>>
> >>>           at
> >>>
> >>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> >>>
> >>>           at
> >>>
> >>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> >>>
> >>>           at
> >>>
> >>> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> >>>
> >>>           at
> >>>
> >>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521)
> >>>
> >>>           at
> >>>
> >>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417)
> >>>
> >>>           at
> >>>
> >>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342)
> >>>
> >>>           at
> >>>
> >>> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678)
> >>>
> >>>           at
> >>>
> >>> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> >>>
> >>>           at
> >>>
> >>> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> >>>
> >>>           at
> >>>
> >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
> >>>
> >>>           at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
> >>>           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015)
> >>>           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
> >>>           at java.security.AccessController.doPrivileged(Native Method)
> >>>           at javax.security.auth.Subject.doAs(Subject.java:422)
> >>>           at
> >>>
> >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> >>>
> >>>           at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943)
> >>>
> >>>
> >>> Masatake Iwasaki
> >>>
> >>> On 2020/06/26 22:51, Gabor Bota wrote:
> >>>> Hi folks,
> >>>>
> >>>> I have put together a release candidate (RC2) for Hadoop 3.1.4.
> >>>>
> >>>> The RC is available at:
> >>> http://people.apache.org/~gabota/hadoop-3.1.4-RC2/
> >>>> The RC tag in git is here:
> >>>> https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC2
> >>>> The maven artifacts are staged at
> >>>> https://repository.apache.org/content/repositories/orgapachehadoop-1269/
> >>>>
> >>>>
> >>>> You can find my public key at:
> >>>> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >>>> and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C
> >>>>
> >>>> Please try the release and vote. The vote will run for 5 weekdays,
> >>>> until July 6. 2020. 23:00 CET.
> >>>>
> >>>> The release includes the revert of HDFS-14941, as it caused
> >>>> HDFS-15421. IBR leak causes standby NN to be stuck in safe mode.
> >>>> (https://issues.apache.org/jira/browse/HDFS-15421)
> >>>> The release includes HDFS-15323, as requested.
> >>>> (https://issues.apache.org/jira/browse/HDFS-15323)
> >>>>
> >>>> Thanks,
> >>>> Gabor
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> >>>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> >>> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> >>>
> >>>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to