I hava seen some code of YARN ,
RMAppAttemptImpl.java
BaseFinalTransition
case KILLED:
{
// don't leave the tracking URL pointing to a non-existent AM
appAttempt.setTrackingUrlToRMAppPage();
appAttempt.invalidateAMHostAndPort();
appEvent =
new RMAppFailedAttemptEvent(applicationId,
RMAppEventType.ATTEMPT_KILLED,
"Application killed by user.", false);
}
break;
case FAILED:
{
// don't leave the tracking URL pointing to a non-existent AM
appAttempt.setTrackingUrlToRMAppPage();
appAttempt.invalidateAMHostAndPort();
if (appAttempt.submissionContext
.getKeepContainersAcrossApplicationAttempts()
&& !appAttempt.isLastAttempt
&& !appAttempt.submissionContext.getUnmanagedAM()) {
keepContainersAcrossAppAttempts = true;
}
appEvent =
new RMAppFailedAttemptEvent(applicationId,
RMAppEventType.ATTEMPT_FAILED, appAttempt.getDiagnostics(),
keepContainersAcrossAppAttempts);
}
when AM container fails ,it may restart and recover the state
but when it was killed ,it would act on another flow.
Later, I have some experiments on pseudo distributed mode
experiment 1
$kill SliderAppMaster # all containers will restart
experiment 2
$kill -6 SliderAppMaster #only the am will restart
$kill SliderAppMaster #only the am will restart
it's very instereting .
2014-12-16 18:34 GMT+08:00 Steve Loughran <[email protected]>:
>
> the suicide operation is only there for testing, for demonstrating that AM
> restart takes place. It lets us
> 1. kill an AM on a remote cluster where we don't have the rights to SSH in
> and kill processes.
> 2. do it as part of a repeatable sequence, such as here
>
> https://github.com/apache/incubator-slider/blob/develop/slider-funtest/src/test/groovy/org/apache/slider/funtest/lifecycle/AMFailuresIT.groovy#L87
>
> so yes, you are right: it's not needed in normal operation. It's only there
> to help test, verify & demo restart behaviour.
>
> On 15 December 2014 at 07:17, 杨浩 <[email protected]> wrote:
>
> > I have done an experient, when I kill the am process, all the container
> > related to this applicationmaster will restart. So the am-suicide method
> > may not be so useful
> >
> > 2014-12-14 13:41 GMT+08:00 杨浩 <[email protected]>:
> > >
> > > It's very useful for me the configure is gone.
> > > As you know , if the am restart , components will not restart. But the
> am
> > > process may be killed , like the server which runs am may shutdown,
> then
> > > will the component restart?
> > >
> > > 2014-12-12 22:08 GMT+08:00 Steve Loughran <[email protected]>:
> > >>
> > >> That's something I think we cut out of the slider code a while back,
> > >> probably before slider 0.50
> > >>
> > >> It was added so that we could work with versions of Hadoop that didn't
> > >> have
> > >> working support for the YARN AM restart feature didn't try to use it.
> > >>
> > >> Prior to Hadoop 2.4, the fields to enable it weren't in the code the
> > >> client
> > >> used to request the feature, or in the data that came back from YARN
> > when
> > >> the AM Started. We used reflection to try to load the methods if they
> > >> weren't there. For extra fun, the method could be in the hadoop JARs
> on
> > >> the
> > >> client, but not on the server, and as we were using the pre-installed
> > >> hadoop JARs on the server, we could end up setting the option on the
> > >> client, but not have it do anything.
> > >>
> > >> I think the flag was there to tell the tests whether or not the
> feature
> > >> was
> > >> present in the destination cluster, so whether to run tests to kill
> the
> > AM
> > >> and expect it to come back up *retaining the existing containers*
> —that
> > >> is,
> > >> if the AM could be restarted without the running application noticing.
> > >>
> > >> Everything works on Hadoop 2.6, so the option is gone, tests do kill
> the
> > >> AM
> > >> and expect it come back (there's a "slider am-suicide" command for
> > testing
> > >> this).
> > >>
> > >> There's a property "slider.yarn.restart.limit" which sets a limit on
> how
> > >> many times slider should ask to restart; if unset you get the YARN
> limit
> > >> defined by "yarn.resourcemanager.am.max-retries" (plus some windowing
> > >> feature which handles intermittent timeouts over a long running
> > service).
> > >> Set it to 1 and should say "no restarts" (i.e. one attempt to run
> slider
> > >> is
> > >> made -the first)
> > >>
> > >> It's covered in the
> > >> http://slider.incubator.apache.org/docs/client-configuration.html
> docs
> > >>
> > >> -steve
> > >>
> > >>
> > >>
> > >> On 12 December 2014 at 11:41, 杨浩 <[email protected]> wrote:
> > >>
> > >> > How to configure the configuration? When set false, sometimes it
> > works,
> > >> and
> > >> > sometimes not.
> > >> >
> > >>
> > >> --
> > >> CONFIDENTIALITY NOTICE
> > >> NOTICE: This message is intended for the use of the individual or
> entity
> > >> to
> > >> which it is addressed and may contain information that is
> confidential,
> > >> privileged and exempt from disclosure under applicable law. If the
> > reader
> > >> of this message is not the intended recipient, you are hereby notified
> > >> that
> > >> any printing, copying, dissemination, distribution, disclosure or
> > >> forwarding of this communication is strictly prohibited. If you have
> > >> received this communication in error, please contact the sender
> > >> immediately
> > >> and delete it from your system. Thank You.
> > >>
> > >
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>