I hava seen some code of YARN ,

RMAppAttemptImpl.java
BaseFinalTransition
        case KILLED:
        {
          // don't leave the tracking URL pointing to a non-existent AM
          appAttempt.setTrackingUrlToRMAppPage();
          appAttempt.invalidateAMHostAndPort();
          appEvent =
              new RMAppFailedAttemptEvent(applicationId,
                  RMAppEventType.ATTEMPT_KILLED,
                  "Application killed by user.", false);
        }
        break;
        case FAILED:
        {
          // don't leave the tracking URL pointing to a non-existent AM
          appAttempt.setTrackingUrlToRMAppPage();
          appAttempt.invalidateAMHostAndPort();
          if (appAttempt.submissionContext
            .getKeepContainersAcrossApplicationAttempts()
              && !appAttempt.isLastAttempt
              && !appAttempt.submissionContext.getUnmanagedAM()) {
            keepContainersAcrossAppAttempts = true;
          }
          appEvent =
              new RMAppFailedAttemptEvent(applicationId,
                RMAppEventType.ATTEMPT_FAILED, appAttempt.getDiagnostics(),
                keepContainersAcrossAppAttempts);

        }

when AM container fails ,it may restart and recover the state
but when it was killed ,it would act on another flow.


Later, I have some experiments on pseudo distributed mode

experiment 1
    $kill SliderAppMaster # all containers will restart
experiment 2
   $kill -6 SliderAppMaster  #only the am will restart
   $kill SliderAppMaster #only the am will restart

it's very instereting .

2014-12-16 18:34 GMT+08:00 Steve Loughran <[email protected]>:
>
> the suicide operation is only there for testing, for demonstrating that AM
> restart takes place. It lets us
> 1. kill an AM on a remote cluster where we don't have the rights to SSH in
> and kill processes.
> 2. do it as part of a repeatable sequence, such as here
>
> https://github.com/apache/incubator-slider/blob/develop/slider-funtest/src/test/groovy/org/apache/slider/funtest/lifecycle/AMFailuresIT.groovy#L87
>
> so yes, you are right: it's not needed in normal operation. It's only there
> to help test, verify & demo restart behaviour.
>
> On 15 December 2014 at 07:17, 杨浩 <[email protected]> wrote:
>
> > I have done an experient, when I kill the am process, all the container
> > related to this applicationmaster will restart. So the  am-suicide method
> > may not be so useful
> >
> > 2014-12-14 13:41 GMT+08:00 杨浩 <[email protected]>:
> > >
> > > It's very useful for me the configure is gone.
> > > As you know , if the am restart , components will not restart. But the
> am
> > > process may be killed , like the server which runs am may shutdown,
> then
> > > will the component restart?
> > >
> > > 2014-12-12 22:08 GMT+08:00 Steve Loughran <[email protected]>:
> > >>
> > >> That's something I think we cut out of the slider code a while back,
> > >> probably before  slider 0.50
> > >>
> > >> It was added so that we could work with versions of Hadoop that didn't
> > >> have
> > >> working support for the YARN AM restart feature didn't try to use it.
> > >>
> > >> Prior to Hadoop 2.4, the fields to enable it weren't in the code the
> > >> client
> > >> used to request the feature, or in the data that came back from YARN
> > when
> > >> the AM Started. We used reflection to try to load the methods if they
> > >> weren't there. For extra fun, the method could be in the hadoop JARs
> on
> > >> the
> > >> client, but not on the server, and as we were using the pre-installed
> > >> hadoop JARs on the server, we could end up setting the option on the
> > >> client, but not have it do anything.
> > >>
> > >> I think the flag was there to tell the tests whether or not the
> feature
> > >> was
> > >> present in the destination cluster, so whether to run tests to kill
> the
> > AM
> > >> and expect it to come back up *retaining the existing containers*
> —that
> > >> is,
> > >> if the AM could be restarted without the running application noticing.
> > >>
> > >> Everything works on Hadoop 2.6, so the option is gone, tests do kill
> the
> > >> AM
> > >> and expect it come back (there's a "slider am-suicide" command for
> > testing
> > >> this).
> > >>
> > >> There's a property "slider.yarn.restart.limit" which sets a limit on
> how
> > >> many times slider should ask to restart; if unset you get the YARN
> limit
> > >> defined by "yarn.resourcemanager.am.max-retries" (plus some windowing
> > >> feature which handles intermittent timeouts over a long running
> > service).
> > >> Set it to 1 and should say "no restarts" (i.e. one attempt to run
> slider
> > >> is
> > >> made -the first)
> > >>
> > >> It's covered in the
> > >> http://slider.incubator.apache.org/docs/client-configuration.html
> docs
> > >>
> > >> -steve
> > >>
> > >>
> > >>
> > >> On 12 December 2014 at 11:41, 杨浩 <[email protected]> wrote:
> > >>
> > >> > How to configure the configuration? When set false, sometimes it
> > works,
> > >> and
> > >> > sometimes not.
> > >> >
> > >>
> > >> --
> > >> CONFIDENTIALITY NOTICE
> > >> NOTICE: This message is intended for the use of the individual or
> entity
> > >> to
> > >> which it is addressed and may contain information that is
> confidential,
> > >> privileged and exempt from disclosure under applicable law. If the
> > reader
> > >> of this message is not the intended recipient, you are hereby notified
> > >> that
> > >> any printing, copying, dissemination, distribution, disclosure or
> > >> forwarding of this communication is strictly prohibited. If you have
> > >> received this communication in error, please contact the sender
> > >> immediately
> > >> and delete it from your system. Thank You.
> > >>
> > >
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Reply via email to