Thanks. I updated the YARN bug with the OS info. I saw that RM HA is disabled. By the way there is a patch submitted by YARN for the RM HA issue - https://issues.apache.org/jira/browse/SLIDER-846
As part of the YARN bug https://issues.apache.org/jira/browse/YARN-2605. If you want I can provide you a patch to test, if you are okay to get a jar from us. -Gour On 4/29/15, 11:18 AM, "Chackravarthy Esakkimuthu" <[email protected]> wrote: >OS installed is debian 7. >And as I was facing issue (components were not starting) with RM HA >enabled, I am testing it with RM HA disabled only. And yes, still NN HA is >still enabled in the cluster. > >On Wed, Apr 29, 2015 at 11:37 PM, Gour Saha <[email protected]> wrote: > >> Unfortunately we haven¹t reproduced this issue in the envs we usually >>test >> on. We might have to create an exact replica of your cluster (with RM >>HA, >> NN HA, OS version, # of nodes, etc.) to be able to reproduce it. The >>YARN >> team is looking into this issue. >> >> By the way, what is the OS and version of the nodes in your cluster? >> >> -Gour >> >> On 4/29/15, 10:49 AM, "Chackravarthy Esakkimuthu" >><[email protected]> >> wrote: >> >> >sure Gour, Thanks for helping out. >> >Do you also see these kind of issues? Is it reproducible for you as >>well? >> > >> >On Wed, Apr 29, 2015 at 8:58 PM, Gour Saha <[email protected]> >>wrote: >> > >> >> Thanks Chackra for providing the Slider and NM logs and configs of >>the >> >> cluster. From the logs it seems like a YARN bug, so I went ahead and >> >>filed >> >> one. I will follow up with the YARN team to see what is causing this >>- >> >> >> >> https://issues.apache.org/jira/browse/YARN-3561 >> >> >> >> >> >> -Gour >> >> >> >> On 4/28/15, 7:48 AM, "Gour Saha" <[email protected]> wrote: >> >> >> >> >Can you send us the complete-config dump? >> >> > >> >> >-Gour >> >> > >> >> >On 4/28/15, 2:45 AM, "Chackravarthy Esakkimuthu" >> >><[email protected]> >> >> >wrote: >> >> > >> >> >>yes this is the config taken by slider also. >> >> >> >> >> >> >> >> >> >> >> >>http://host2:8088/proxy/application_1428575950531_0016/ws/v1/slider/publi >> >> >>s >> >> >>her/slider/complete-config >> >> >> >> >> >>yarn.nodemanager.sleep-delay-before-sigkill.ms: "250" >> >> >> >> >> >>its default value coming from yarn-default. >> >> >>We have not configured it in yarn-site. >> >> >> >> >> >>On Tue, Apr 28, 2015 at 3:03 PM, Chackravarthy Esakkimuthu < >> >> >>[email protected]> wrote: >> >> >> >> >> >>> Following is the config which I get from RM UI, >> >> >>> >> >> >>> http://host2:8088/conf >> >> >>> >> >> >>> <property> >> >> >>> <name>yarn.nodemanager.sleep-delay-before-sigkill.ms</name> >> >> >>> <value>250</value> >> >> >>> <source>yarn-default.xml</source> >> >> >>> </property> >> >> >>> >> >> >>> On Tue, Apr 28, 2015 at 2:50 PM, Steve Loughran >> >> >>><[email protected]> >> >> >>> wrote: >> >> >>> >> >> >>>> >> >> >>>> > On 28 Apr 2015, at 10:07, Chackravarthy Esakkimuthu < >> >> >>>> [email protected]> wrote: >> >> >>>> > >> >> >>>> > sure, will send you the logs. >> >> >>>> > >> >> >>>> > And the same pattern follows for hbase installation also. >> >> >>>> > 'stop' command stops only SliderAM. >> >> >>>> > 'destroy' command stops HMaster and RegionServer only.. >> >>HBASE_REST >> >> >>>>and >> >> >>>> > THRIFT_2 still running after destroy command, And slider >>agents >> >> >>>>running >> >> >>>> in >> >> >>>> > all 4 hosts where container was launched. >> >> >>>> > >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> do you have YARN set up to actually kill processes when the >> >>containers >> >> >>>> are released.? >> >> >>>> >> >> >>>> For example: >> >> >>>> >> >> >>>> <!--time before the process gets a -9 --> >> >> >>>> <property> >> >> >>>> <name>yarn.nodemanager.sleep-delay-before-sigkill.ms</name> >> >> >>>> <value>30000</value> >> >> >>>> </property> >> >> >>>> >> >> >>> >> >> >>> >> >> > >> >> >> >> >> >>
