[
https://issues.apache.org/jira/browse/HADOOP-13035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265839#comment-15265839
]
Steve Loughran commented on HADOOP-13035:
-----------------------------------------
-1, as it
This is a pretty fundamental change. I would have also really liked to have
been pinged on this earlier, given my hands are all over the code as it stands.
While I acknowledge it isn't perfect, it does include experience on other
systems, and I did go through every single YARN service, repeatedly, until
things were stable.
This whole discrepancy between state-> starting and service->live is a
recurrent problem, but as you can see from things like web and IPC servers
starting in the background, service start() is inherently async; what code
really needs to wait upon is not the state change complete, but to await for
the started state to go live, which *may happen at some indeterminate state in
the future*
Without picking into this patch in detail, here are the places which have
caused most trouble over time, which any patch at what is a fundamental bit of
how the YARN services are constructed is going to have to look at
* subclasses of {{CompositeService}} adding new services in service start,
having to push them through their lifecycle enough to attach them to their
parent, then rely on the remaining of the serviceStart lifecycle to walk
themselves through.
* things going wrong in composite start and having to unroll the stack
* things trying to call stop() during start.
* the fact that calling start() on a service which is started *or in the
process of starting* is required to be a no-op.
* the issue as to when is serviceStop() invoked on a service when stop() is
called? Currently: not until you init(). it had better be after initing() now.
Can i also note that the ubquity of YarnClient means this class gets used a lot
downstream. Admittedly, I use it most of all, but you can essentially build
yarn based apps by aggregating their service lifecycles together. Which means
there is a risk that things may change. Before a descendant of this patch goes
in, someone is going to have to have built and tested slider's functional test
suite against a version of Hadoop with this turned on. I think they'll be able
to dodge doing the same in Hive, as Hive 1.2.x still uses a cut-and-paste of
the the 2.0 service model before the YARN-117 patch went in; which, if you've
ever seen how Spark Thriftserver abuses introspection to subclass (SPARK-8064,
SPARK-10793) you'll be grateful there.
I also to know what happens to YARN-679 and YARN-1564 with this. I propose
adding them first, as that will expand the codebase, and, as much of this is
code which I can migrate slider to, will make it easier for slider to adapt to
a change this fundamental.
Accordingly, I'll tag this as a depends-on there, rebase those two batches with
trunk and await reviews.
> Add states INITING and STARTING to YARN Service model to cover in-transition
> states.
> ------------------------------------------------------------------------------------
>
> Key: HADOOP-13035
> URL: https://issues.apache.org/jira/browse/HADOOP-13035
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Bibin A Chundatt
> Attachments: 0001-HADOOP-13035.patch, 0002-HADOOP-13035.patch,
> 0003-HADOOP-13035.patch
>
>
> As per the discussion in YARN-3971 the we should be setting the service state
> to STARTED only after serviceStart()
> Currently {{AbstractService#start()}} is set
> {noformat}
> if (stateModel.enterState(STATE.STARTED) != STATE.STARTED) {
> try {
> startTime = System.currentTimeMillis();
> serviceStart();
> ..
> }
> {noformat}
> enterState sets the service state to proposed state. So in
> {{service.getServiceState}} in {{serviceStart()}} will return STARTED .
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]