Unfortunately we haven¹t reproduced this issue in the envs we usually test
on. We might have to create an exact replica of your cluster (with RM HA,
NN HA, OS version, # of nodes, etc.) to be able to reproduce it. The YARN
team is looking into this issue.

By the way, what is the OS and version of the nodes in your cluster?

-Gour

On 4/29/15, 10:49 AM, "Chackravarthy Esakkimuthu" <[email protected]>
wrote:

>sure Gour, Thanks for helping out.
>Do you also see these kind of issues? Is it reproducible for you as well?
>
>On Wed, Apr 29, 2015 at 8:58 PM, Gour Saha <[email protected]> wrote:
>
>> Thanks Chackra for providing the Slider and NM logs and configs of the
>> cluster. From the logs it seems like a YARN bug, so I went ahead and
>>filed
>> one. I will follow up with the YARN team to see what is causing this -
>>
>> https://issues.apache.org/jira/browse/YARN-3561
>>
>>
>> -Gour
>>
>> On 4/28/15, 7:48 AM, "Gour Saha" <[email protected]> wrote:
>>
>> >Can you send us the complete-config dump?
>> >
>> >-Gour
>> >
>> >On 4/28/15, 2:45 AM, "Chackravarthy Esakkimuthu"
>><[email protected]>
>> >wrote:
>> >
>> >>yes this is the config taken by slider also.
>> >>
>> >>
>> 
>>http://host2:8088/proxy/application_1428575950531_0016/ws/v1/slider/publi
>> >>s
>> >>her/slider/complete-config
>> >>
>> >>yarn.nodemanager.sleep-delay-before-sigkill.ms: "250"
>> >>
>> >>its default value coming from yarn-default.
>> >>We have not configured it in yarn-site.
>> >>
>> >>On Tue, Apr 28, 2015 at 3:03 PM, Chackravarthy Esakkimuthu <
>> >>[email protected]> wrote:
>> >>
>> >>> Following is the config which I get from RM UI,
>> >>>
>> >>> http://host2:8088/conf
>> >>>
>> >>> <property>
>> >>> <name>yarn.nodemanager.sleep-delay-before-sigkill.ms</name>
>> >>> <value>250</value>
>> >>> <source>yarn-default.xml</source>
>> >>> </property>
>> >>>
>> >>> On Tue, Apr 28, 2015 at 2:50 PM, Steve Loughran
>> >>><[email protected]>
>> >>> wrote:
>> >>>
>> >>>>
>> >>>> > On 28 Apr 2015, at 10:07, Chackravarthy Esakkimuthu <
>> >>>> [email protected]> wrote:
>> >>>> >
>> >>>> > sure, will send you the logs.
>> >>>> >
>> >>>> > And the same pattern follows for hbase installation also.
>> >>>> > 'stop' command stops only SliderAM.
>> >>>> > 'destroy' command stops HMaster and RegionServer only..
>>HBASE_REST
>> >>>>and
>> >>>> > THRIFT_2 still running after destroy command, And slider agents
>> >>>>running
>> >>>> in
>> >>>> > all 4 hosts where container was launched.
>> >>>> >
>> >>>>
>> >>>>
>> >>>>
>> >>>> do you have YARN set up to actually kill processes when the
>>containers
>> >>>> are released.?
>> >>>>
>> >>>> For example:
>> >>>>
>> >>>> <!--time before the process gets a -9 -->
>> >>>> <property>
>> >>>>   <name>yarn.nodemanager.sleep-delay-before-sigkill.ms</name>
>> >>>>   <value>30000</value>
>> >>>> </property>
>> >>>>
>> >>>
>> >>>
>> >
>>
>>

Reply via email to