You are probably right that this file is not deleted on a stop. I don¹t
have a cluster with Slider to quickly test this. YARN Service keeps us
busy you know :) 

But if you can replicate this you should file a jira.

-Gour

On 1/17/18, 5:02 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote:

>Gour,
>
>Thanks for the prompt reply.
>
>
>   1. Temp hickup in HDFS as possible cause has been on mind as well,
>   wanted to reach out to slider community to check if there were other
>issues
>   causing this symptom.
>   2. I remember I had stopped and started the slider app after this time
>   stamp. Apparently App Stop/Start did not delete this file. Can you
>confirm
>   that behaviour ? Also would it make sense to have a enhancement to
>delete
>   this file on App stop/start if indeed not being done ?
>
>Thanks,
>
>Manoj
>
>On Wed, Jan 17, 2018 at 1:50 PM, Gour Saha <gs...@hortonworks.com> wrote:
>
>> Manoj,
>> By any chance is it possible to find out (maybe from logs or sar files)
>>if
>> there was HDFS unavailability (say NN node connection issue) around the
>> time of 2018-01-06 00:33 (based on the readlock file timestamp)?
>>
>> -rw-r--r--   3 xxx xxx         23 2018-01-06 00:33
>> hdfs://xxx/user/xxx/.slider/cluster/spas/readlock
>>
>>
>> -Gour
>>
>> On 1/17/18, 1:05 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote:
>>
>> >Hello,
>> >
>> >Slider version 0.80 on CDH 5.5.1 cluster with kerberos
>> >
>> >Slider upgrade <App> --template /xxx/appConfig.json --resources
>> >/xxx/resources.json --queue tenant --force failed with following trace
>> >
>> >2018-01-17 20:31:23,030 [main] INFO  tools.SliderUtils - JVM
>>initialized
>> >into secure mode with kerberos realm BIGDATA
>> >2018-01-17 20:31:23,869 [main]
>> >INFO  client.ConfiguredRMFailoverProxyProvider - Failing over to rm2
>> >2018-01-17 20:31:24,325 [main] WARN  client.SliderClient - Failed to
>>get a
>> >Lock on Builder working with spas at
>> >hdfs://xxx/user/xxx/.slider/cluster/spas :
>> >org.apache.slider.core.persist.LockAcquireFailedException: Failed to
>> >acquire lock hdfs://xxx/user/xxx/.slider/cluster/spas/readlock
>> >org.apache.slider.core.persist.LockAcquireFailedException: Failed to
>> >acquire lock hdfs://xxx/user/xxx/.slider/cluster/spas/readlock
>> >    at
>> >org.apache.slider.core.persist.ConfPersister.
>> acquireWritelock(ConfPersiste
>> >r.java:141)
>> >
>> >    at
>> 
>>>org.apache.slider.core.persist.ConfPersister.save(ConfPersister.java:253
>>>)
>> >    at
>> >org.apache.slider.core.build.InstanceBuilder.persist(
>> InstanceBuilder.java:
>> >270)
>> >
>> >    at
>> >org.apache.slider.client.SliderClient.persistInstanceDefinition(
>> SliderClie
>> >nt.java:1836)
>> >
>> >    at
>> >org.apache.slider.client.SliderClient.buildInstanceDefinition(
>> SliderClient
>> >.java:1734)
>> >
>> >    at
>> >org.apache.slider.client.SliderClient.actionUpgrade(
>> SliderClient.java:802)
>> >    at 
>>org.apache.slider.client.SliderClient.exec(SliderClient.java:542)
>> >    at
>> >org.apache.slider.client.SliderClient.runService(SliderClient.java:424)
>> >    at
>> >org.apache.slider.core.main.ServiceLauncher.launchService(
>> ServiceLauncher.
>> >java:188)
>> >
>> >    at
>> >org.apache.slider.core.main.ServiceLauncher.
>> launchServiceRobustly(ServiceL
>> >auncher.java:475)
>> >
>> >    at
>> >org.apache.slider.core.main.ServiceLauncher.
>> launchServiceAndExit(ServiceLa
>> >uncher.java:403)
>> >
>> >    at
>> >org.apache.slider.core.main.ServiceLauncher.serviceMain(
>> ServiceLauncher.ja
>> >va:630)
>> >
>> >    at org.apache.slider.Slider.main(Slider.java:49)
>> >2018-01-17 20:31:24,327 [main] ERROR main.ServiceLauncher - Failed to
>>save
>> >spas: org.apache.slider.core.persist.LockAcquireFailedException: Failed
>> to
>> >acquire lock hdfs://xxx/user/xxx/.slider/cluster/spas/readlock
>> >2018-01-17 20:31:24,328 [main] INFO  util.ExitUtil - Exiting with
>>status
>> >70
>> >
>> >HDFS ls listing showed a file readlock was created few days back
>> >
>> >hdfs dfs -ls hdfs://xxx/user/xxx/.slider/cluster/spas
>> >...
>> >-rw-r--r--   3 xxx xxx         23 2018-01-06 00:33
>> >hdfs://xxx/user/xxx/.slider/cluster/spas/readlock
>> >...
>> >
>> >After deleting this file manually, the upgrade command works.
>> >
>> >Any idea when is this file created and why it was not removed ?
>> >
>> >Thanks in advance,
>> >
>> >Manoj
>>
>>

Reply via email to