Unfortunately we haven¹t reproduced this issue in the envs we usually test on. We might have to create an exact replica of your cluster (with RM HA, NN HA, OS version, # of nodes, etc.) to be able to reproduce it. The YARN team is looking into this issue.
By the way, what is the OS and version of the nodes in your cluster? -Gour On 4/29/15, 10:49 AM, "Chackravarthy Esakkimuthu" <[email protected]> wrote: >sure Gour, Thanks for helping out. >Do you also see these kind of issues? Is it reproducible for you as well? > >On Wed, Apr 29, 2015 at 8:58 PM, Gour Saha <[email protected]> wrote: > >> Thanks Chackra for providing the Slider and NM logs and configs of the >> cluster. From the logs it seems like a YARN bug, so I went ahead and >>filed >> one. I will follow up with the YARN team to see what is causing this - >> >> https://issues.apache.org/jira/browse/YARN-3561 >> >> >> -Gour >> >> On 4/28/15, 7:48 AM, "Gour Saha" <[email protected]> wrote: >> >> >Can you send us the complete-config dump? >> > >> >-Gour >> > >> >On 4/28/15, 2:45 AM, "Chackravarthy Esakkimuthu" >><[email protected]> >> >wrote: >> > >> >>yes this is the config taken by slider also. >> >> >> >> >> >>http://host2:8088/proxy/application_1428575950531_0016/ws/v1/slider/publi >> >>s >> >>her/slider/complete-config >> >> >> >>yarn.nodemanager.sleep-delay-before-sigkill.ms: "250" >> >> >> >>its default value coming from yarn-default. >> >>We have not configured it in yarn-site. >> >> >> >>On Tue, Apr 28, 2015 at 3:03 PM, Chackravarthy Esakkimuthu < >> >>[email protected]> wrote: >> >> >> >>> Following is the config which I get from RM UI, >> >>> >> >>> http://host2:8088/conf >> >>> >> >>> <property> >> >>> <name>yarn.nodemanager.sleep-delay-before-sigkill.ms</name> >> >>> <value>250</value> >> >>> <source>yarn-default.xml</source> >> >>> </property> >> >>> >> >>> On Tue, Apr 28, 2015 at 2:50 PM, Steve Loughran >> >>><[email protected]> >> >>> wrote: >> >>> >> >>>> >> >>>> > On 28 Apr 2015, at 10:07, Chackravarthy Esakkimuthu < >> >>>> [email protected]> wrote: >> >>>> > >> >>>> > sure, will send you the logs. >> >>>> > >> >>>> > And the same pattern follows for hbase installation also. >> >>>> > 'stop' command stops only SliderAM. >> >>>> > 'destroy' command stops HMaster and RegionServer only.. >>HBASE_REST >> >>>>and >> >>>> > THRIFT_2 still running after destroy command, And slider agents >> >>>>running >> >>>> in >> >>>> > all 4 hosts where container was launched. >> >>>> > >> >>>> >> >>>> >> >>>> >> >>>> do you have YARN set up to actually kill processes when the >>containers >> >>>> are released.? >> >>>> >> >>>> For example: >> >>>> >> >>>> <!--time before the process gets a -9 --> >> >>>> <property> >> >>>> <name>yarn.nodemanager.sleep-delay-before-sigkill.ms</name> >> >>>> <value>30000</value> >> >>>> </property> >> >>>> >> >>> >> >>> >> > >> >>
