Thanks Chackra for providing the Slider and NM logs and configs of the cluster. From the logs it seems like a YARN bug, so I went ahead and filed one. I will follow up with the YARN team to see what is causing this -
https://issues.apache.org/jira/browse/YARN-3561 -Gour On 4/28/15, 7:48 AM, "Gour Saha" <[email protected]> wrote: >Can you send us the complete-config dump? > >-Gour > >On 4/28/15, 2:45 AM, "Chackravarthy Esakkimuthu" <[email protected]> >wrote: > >>yes this is the config taken by slider also. >> >>http://host2:8088/proxy/application_1428575950531_0016/ws/v1/slider/publi >>s >>her/slider/complete-config >> >>yarn.nodemanager.sleep-delay-before-sigkill.ms: "250" >> >>its default value coming from yarn-default. >>We have not configured it in yarn-site. >> >>On Tue, Apr 28, 2015 at 3:03 PM, Chackravarthy Esakkimuthu < >>[email protected]> wrote: >> >>> Following is the config which I get from RM UI, >>> >>> http://host2:8088/conf >>> >>> <property> >>> <name>yarn.nodemanager.sleep-delay-before-sigkill.ms</name> >>> <value>250</value> >>> <source>yarn-default.xml</source> >>> </property> >>> >>> On Tue, Apr 28, 2015 at 2:50 PM, Steve Loughran >>><[email protected]> >>> wrote: >>> >>>> >>>> > On 28 Apr 2015, at 10:07, Chackravarthy Esakkimuthu < >>>> [email protected]> wrote: >>>> > >>>> > sure, will send you the logs. >>>> > >>>> > And the same pattern follows for hbase installation also. >>>> > 'stop' command stops only SliderAM. >>>> > 'destroy' command stops HMaster and RegionServer only.. HBASE_REST >>>>and >>>> > THRIFT_2 still running after destroy command, And slider agents >>>>running >>>> in >>>> > all 4 hosts where container was launched. >>>> > >>>> >>>> >>>> >>>> do you have YARN set up to actually kill processes when the containers >>>> are released.? >>>> >>>> For example: >>>> >>>> <!--time before the process gets a -9 --> >>>> <property> >>>> <name>yarn.nodemanager.sleep-delay-before-sigkill.ms</name> >>>> <value>30000</value> >>>> </property> >>>> >>> >>> >
