Manoj, By any chance is it possible to find out (maybe from logs or sar files) if there was HDFS unavailability (say NN node connection issue) around the time of 2018-01-06 00:33 (based on the readlock file timestamp)?
-rw-r--r-- 3 xxx xxx 23 2018-01-06 00:33 hdfs://xxx/user/xxx/.slider/cluster/spas/readlock -Gour On 1/17/18, 1:05 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: >Hello, > >Slider version 0.80 on CDH 5.5.1 cluster with kerberos > >Slider upgrade <App> --template /xxx/appConfig.json --resources >/xxx/resources.json --queue tenant --force failed with following trace > >2018-01-17 20:31:23,030 [main] INFO tools.SliderUtils - JVM initialized >into secure mode with kerberos realm BIGDATA >2018-01-17 20:31:23,869 [main] >INFO client.ConfiguredRMFailoverProxyProvider - Failing over to rm2 >2018-01-17 20:31:24,325 [main] WARN client.SliderClient - Failed to get a >Lock on Builder working with spas at >hdfs://xxx/user/xxx/.slider/cluster/spas : >org.apache.slider.core.persist.LockAcquireFailedException: Failed to >acquire lock hdfs://xxx/user/xxx/.slider/cluster/spas/readlock >org.apache.slider.core.persist.LockAcquireFailedException: Failed to >acquire lock hdfs://xxx/user/xxx/.slider/cluster/spas/readlock > at >org.apache.slider.core.persist.ConfPersister.acquireWritelock(ConfPersiste >r.java:141) > > at >org.apache.slider.core.persist.ConfPersister.save(ConfPersister.java:253) > at >org.apache.slider.core.build.InstanceBuilder.persist(InstanceBuilder.java: >270) > > at >org.apache.slider.client.SliderClient.persistInstanceDefinition(SliderClie >nt.java:1836) > > at >org.apache.slider.client.SliderClient.buildInstanceDefinition(SliderClient >.java:1734) > > at >org.apache.slider.client.SliderClient.actionUpgrade(SliderClient.java:802) > at org.apache.slider.client.SliderClient.exec(SliderClient.java:542) > at >org.apache.slider.client.SliderClient.runService(SliderClient.java:424) > at >org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher. >java:188) > > at >org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceL >auncher.java:475) > > at >org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLa >uncher.java:403) > > at >org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.ja >va:630) > > at org.apache.slider.Slider.main(Slider.java:49) >2018-01-17 20:31:24,327 [main] ERROR main.ServiceLauncher - Failed to save >spas: org.apache.slider.core.persist.LockAcquireFailedException: Failed to >acquire lock hdfs://xxx/user/xxx/.slider/cluster/spas/readlock >2018-01-17 20:31:24,328 [main] INFO util.ExitUtil - Exiting with status >70 > >HDFS ls listing showed a file readlock was created few days back > >hdfs dfs -ls hdfs://xxx/user/xxx/.slider/cluster/spas >... >-rw-r--r-- 3 xxx xxx 23 2018-01-06 00:33 >hdfs://xxx/user/xxx/.slider/cluster/spas/readlock >... > >After deleting this file manually, the upgrade command works. > >Any idea when is this file created and why it was not removed ? > >Thanks in advance, > >Manoj