You are probably right that this file is not deleted on a stop. I don¹t have a cluster with Slider to quickly test this. YARN Service keeps us busy you know :)
But if you can replicate this you should file a jira. -Gour On 1/17/18, 5:02 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: >Gour, > >Thanks for the prompt reply. > > > 1. Temp hickup in HDFS as possible cause has been on mind as well, > wanted to reach out to slider community to check if there were other >issues > causing this symptom. > 2. I remember I had stopped and started the slider app after this time > stamp. Apparently App Stop/Start did not delete this file. Can you >confirm > that behaviour ? Also would it make sense to have a enhancement to >delete > this file on App stop/start if indeed not being done ? > >Thanks, > >Manoj > >On Wed, Jan 17, 2018 at 1:50 PM, Gour Saha <gs...@hortonworks.com> wrote: > >> Manoj, >> By any chance is it possible to find out (maybe from logs or sar files) >>if >> there was HDFS unavailability (say NN node connection issue) around the >> time of 2018-01-06 00:33 (based on the readlock file timestamp)? >> >> -rw-r--r-- 3 xxx xxx 23 2018-01-06 00:33 >> hdfs://xxx/user/xxx/.slider/cluster/spas/readlock >> >> >> -Gour >> >> On 1/17/18, 1:05 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote: >> >> >Hello, >> > >> >Slider version 0.80 on CDH 5.5.1 cluster with kerberos >> > >> >Slider upgrade <App> --template /xxx/appConfig.json --resources >> >/xxx/resources.json --queue tenant --force failed with following trace >> > >> >2018-01-17 20:31:23,030 [main] INFO tools.SliderUtils - JVM >>initialized >> >into secure mode with kerberos realm BIGDATA >> >2018-01-17 20:31:23,869 [main] >> >INFO client.ConfiguredRMFailoverProxyProvider - Failing over to rm2 >> >2018-01-17 20:31:24,325 [main] WARN client.SliderClient - Failed to >>get a >> >Lock on Builder working with spas at >> >hdfs://xxx/user/xxx/.slider/cluster/spas : >> >org.apache.slider.core.persist.LockAcquireFailedException: Failed to >> >acquire lock hdfs://xxx/user/xxx/.slider/cluster/spas/readlock >> >org.apache.slider.core.persist.LockAcquireFailedException: Failed to >> >acquire lock hdfs://xxx/user/xxx/.slider/cluster/spas/readlock >> > at >> >org.apache.slider.core.persist.ConfPersister. >> acquireWritelock(ConfPersiste >> >r.java:141) >> > >> > at >> >>>org.apache.slider.core.persist.ConfPersister.save(ConfPersister.java:253 >>>) >> > at >> >org.apache.slider.core.build.InstanceBuilder.persist( >> InstanceBuilder.java: >> >270) >> > >> > at >> >org.apache.slider.client.SliderClient.persistInstanceDefinition( >> SliderClie >> >nt.java:1836) >> > >> > at >> >org.apache.slider.client.SliderClient.buildInstanceDefinition( >> SliderClient >> >.java:1734) >> > >> > at >> >org.apache.slider.client.SliderClient.actionUpgrade( >> SliderClient.java:802) >> > at >>org.apache.slider.client.SliderClient.exec(SliderClient.java:542) >> > at >> >org.apache.slider.client.SliderClient.runService(SliderClient.java:424) >> > at >> >org.apache.slider.core.main.ServiceLauncher.launchService( >> ServiceLauncher. >> >java:188) >> > >> > at >> >org.apache.slider.core.main.ServiceLauncher. >> launchServiceRobustly(ServiceL >> >auncher.java:475) >> > >> > at >> >org.apache.slider.core.main.ServiceLauncher. >> launchServiceAndExit(ServiceLa >> >uncher.java:403) >> > >> > at >> >org.apache.slider.core.main.ServiceLauncher.serviceMain( >> ServiceLauncher.ja >> >va:630) >> > >> > at org.apache.slider.Slider.main(Slider.java:49) >> >2018-01-17 20:31:24,327 [main] ERROR main.ServiceLauncher - Failed to >>save >> >spas: org.apache.slider.core.persist.LockAcquireFailedException: Failed >> to >> >acquire lock hdfs://xxx/user/xxx/.slider/cluster/spas/readlock >> >2018-01-17 20:31:24,328 [main] INFO util.ExitUtil - Exiting with >>status >> >70 >> > >> >HDFS ls listing showed a file readlock was created few days back >> > >> >hdfs dfs -ls hdfs://xxx/user/xxx/.slider/cluster/spas >> >... >> >-rw-r--r-- 3 xxx xxx 23 2018-01-06 00:33 >> >hdfs://xxx/user/xxx/.slider/cluster/spas/readlock >> >... >> > >> >After deleting this file manually, the upgrade command works. >> > >> >Any idea when is this file created and why it was not removed ? >> > >> >Thanks in advance, >> > >> >Manoj >> >>