Yes, sure. I'll do another RC for next week. Thank you all for working on this!
On Thu, Jul 9, 2020 at 8:20 AM Masatake Iwasaki <iwasak...@oss.nttdata.co.jp> wrote: > > Hi Gabor Bota, > > I committed the fix of YARN-10347 to branch-3.1. > I think this should be blocker for 3.1.4. > Could you cherry-pick it to branch-3.1.4 and cut a new RC? > > Thanks, > Masatake Iwasaki > > On 2020/07/08 23:31, Masatake Iwasaki wrote: > > Thanks Steve and Prabhu for the information. > > > > The cause turned out to be locking in CapacityScheduler#reinitialize. > > I think the method is called after transitioning to active stat if > > RM-HA is enabled. > > > > I filed YARN-10347 and created PR. > > > > > > Masatake Iwasaki > > > > > > On 2020/07/08 16:33, Prabhu Joseph wrote: > >> Hi Masatake, > >> > >> The thread is waiting for a ReadLock, we need to check what the > >> other > >> thread holding WriteLock is blocked on. > >> Can you get three consecutive complete jstack of ResourceManager > >> during the > >> issue. > >> > >>>> I got no issue if RM-HA is disabled. > >> Looks RM is not able to access Zookeeper State Store. Can you check if > >> there is any connectivity issue between RM and Zookeeper. > >> > >> Thanks, > >> Prabhu Joseph > >> > >> > >> On Mon, Jul 6, 2020 at 2:44 AM Masatake Iwasaki > >> <iwasak...@oss.nttdata.co.jp> > >> wrote: > >> > >>> Thanks for putting this up, Gabor Bota. > >>> > >>> I'm testing the RC2 on 3 node docker cluster with NN-HA and RM-HA > >>> enabled. > >>> ResourceManager reproducibly blocks on submitApplication while > >>> launching > >>> example MR jobs. > >>> Does anyone run into the same issue? > >>> > >>> The same configuration worked for 3.1.3. > >>> I got no issue if RM-HA is disabled. > >>> > >>> > >>> "IPC Server handler 1 on default port 8032" #167 daemon prio=5 > >>> os_prio=0 > >>> tid=0x00007fe91821ec50 nid=0x3b9 waiting on condition > >>> [0x00007fe901bac000] > >>> java.lang.Thread.State: WAITING (parking) > >>> at sun.misc.Unsafe.park(Native Method) > >>> - parking to wait for <0x0000000085d37a40> (a > >>> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > >>> at > >>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > >>> at > >>> > >>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > >>> > >>> at > >>> > >>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > >>> > >>> at > >>> > >>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > >>> > >>> at > >>> > >>> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > >>> > >>> at > >>> > >>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521) > >>> > >>> at > >>> > >>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417) > >>> > >>> at > >>> > >>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342) > >>> > >>> at > >>> > >>> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678) > >>> > >>> at > >>> > >>> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > >>> > >>> at > >>> > >>> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > >>> > >>> at > >>> > >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) > >>> > >>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) > >>> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015) > >>> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) > >>> at java.security.AccessController.doPrivileged(Native Method) > >>> at javax.security.auth.Subject.doAs(Subject.java:422) > >>> at > >>> > >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > >>> > >>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943) > >>> > >>> > >>> Masatake Iwasaki > >>> > >>> On 2020/06/26 22:51, Gabor Bota wrote: > >>>> Hi folks, > >>>> > >>>> I have put together a release candidate (RC2) for Hadoop 3.1.4. > >>>> > >>>> The RC is available at: > >>> http://people.apache.org/~gabota/hadoop-3.1.4-RC2/ > >>>> The RC tag in git is here: > >>>> https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC2 > >>>> The maven artifacts are staged at > >>>> https://repository.apache.org/content/repositories/orgapachehadoop-1269/ > >>>> > >>>> > >>>> You can find my public key at: > >>>> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS > >>>> and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C > >>>> > >>>> Please try the release and vote. The vote will run for 5 weekdays, > >>>> until July 6. 2020. 23:00 CET. > >>>> > >>>> The release includes the revert of HDFS-14941, as it caused > >>>> HDFS-15421. IBR leak causes standby NN to be stuck in safe mode. > >>>> (https://issues.apache.org/jira/browse/HDFS-15421) > >>>> The release includes HDFS-15323, as requested. > >>>> (https://issues.apache.org/jira/browse/HDFS-15323) > >>>> > >>>> Thanks, > >>>> Gabor > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > >>>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org > >>>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org > >>> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org > >>> > >>> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org