Just to summarize Zameer's proposal buried deep in the quotation stream: we keep watch_secs but let users set it to 0 iff they have health checks enabled. No scheduler (updater) changes needed at all. Users will need to opt-in to use the new feature by changing watch_secs to 0 in their configs (or skipping it completely, which will set the default to 0 automatically).
I am +1 to this idea. Thanks Zameer! On Fri, Sep 2, 2016 at 6:28 PM, Zameer Manji <zma...@apache.org> wrote: > Resending my emails from a domain that has mailing list friendly DMARC > settings. > > It seems that we have achieved some consensus on the design, others feel > free to weigh in. > > On Fri, Sep 2, 2016 at 5:29 PM, Zameer Manji <zma...@uber.com> wrote: > >> Kai, >> >> We have had coupled deploys before, I don't think it's too terrible. It's >> something to note in the release notes and some operational pain for large >> users. >> >> On Fri, Sep 2, 2016 at 4:42 PM, 黄 凯 <texasred2...@hotmail.com> wrote: >> >> > Another concern is that once we rolled out the new executor, we should >> > rolled out a new client in order to use the health-check feature. Hence >> the >> > executor and client rolling out process seem to be coupled. >> > >> > >> > >> > >> > ------------------------------ >> > *发件人:* 黄 凯 <texasred2...@hotmail.com> >> > *发送时间:* 2016年9月3日 7:23 >> > *收件人:* Zameer Manji; dev@aurora.apache.org >> > *抄送:* Joshua Cohen; s...@apache.org; cald...@gmail.com; >> > rdelv...@binghamton.edu >> > *主题:* 答复: Discussion on review request 51536 >> > >> > >> > Thanks for the new proposal, Zameer. It sounds good to me. The benefit is >> > that it does not alter the current infrastructure too much. >> > >> > >> > However, there is one thing to keep in mind: >> > >> > we currently do a check to ensure watch_sec is longer than >> > initial_interval_secs. We will have to remove the alert message if we >> > choose to skip watch_sec by setting it as zero. >> > >> > >> > So the new configuration will not support executor-driven health check >> > unless the executors are rolled out 100%. >> > >> > >> > Does this tradeoff seems OK for us, Maxim? >> > >> > >> > Kai >> > >> > >> > ------------------------------ >> > *发件人:* Zameer Manji <zma...@uber.com> >> > *发送时间:* 2016年9月3日 6:53 >> > *收件人:* dev@aurora.apache.org >> > *抄送:* 黄 凯; Joshua Cohen; s...@apache.org; cald...@gmail.com; >> > rdelv...@binghamton.edu >> > *主题:* Re: Discussion on review request 51536 >> > >> > >> > >> > On Fri, Sep 2, 2016 at 3:24 PM, Maxim Khutornenko <ma...@apache.org> >> > wrote: >> > >> >> Need to correct a few previous statements: >> >> >> >> > Also we do not want to expose this message to users. >> >> This is incorrect. The original design proposal suggested to show this >> >> message in the UI as: "Task is healthy" >> >> >> > >> > Does this mean the message in the status update is going to be exactly, >> > "Task is healthy" and the scheduler is going to check for this string in >> > the `TASK_RUNNING` status update? This means we are going to establish a >> > communication >> > mechanism between the executor and scheduler that's not defined by a >> > schema. I feel that's worse than putting JSON in there and having the >> > scheduler parse it. >> > >> > >> >> > The Mesos API isn't designed for packing arbitrary data >> >> > in the status update message. >> >> Don't think I agree, this is exactly what this field is for [1] and we >> >> already use it for other states [2]. >> >> >> > >> > I guess I should have said 'structured arbitrary data'. The >> informational, >> > messages are fine and we plumb them blindly into our logging and UI. I'm >> > not convinced we should start putting JSON or something more structured >> in >> > there. That's yet another schema we have and yet another versioning story >> > we have to go though. This also complicates matters for custom executor >> > authors. >> > >> > >> >> >> >> > I would be open to just saying that scheduler version >> >> > 0.16 (or 0.17) just assumes the executor transitions to >> >> > RUNNING once a task is healthy and dropping >> >> > `watch_secs`entirely. >> >> We can't drop 'watch_secs' entirely as we still have to babysit job >> >> updates that don't have health checks enabled. >> >> >> > >> > Understood. I guess we can keep it but I'm now frustrated that we have a >> > parameter that is ignored if we set some json in ExecutorConfig.data. >> > Ideally, we don't accept `watch_secs` if we want health check driven >> > updates. As mentioned before I don't like this implicit tightening of the >> > executor and the scheduler. >> > >> > >> >> >> >> As for my take on the above, I favor #1 as the simplest answer to an >> >> already simple question: "Should we use watch_secs for this instance >> >> or not?". That's pretty much it. Scheduler does not need any schema >> >> changes, know what health checks are or if a job has them enabled. At >> >> least not until we attempt to move to centralized health checks >> >> (AURORA-279) but that will be an entirely different design discussion. >> >> >> >> [1] - https://github.com/apache/mesos/blob/master/include/mesos/ >> >> mesos.proto#L1605. >> >> [2] - https://github.com/apache/aurora/blob/5cad046fc0f0c4bb79a456 >> >> 3cfcff0442b7bf8383/src/main/python/apache/aurora/executor/ >> >> aurora_executor.py#L97 >> > >> > >> > >> > With all of this in mind, I have another proposal. Why can't we have the >> > executor changes (wait until the task is healthy for RUNNING) *and* read >> > `watch_secs` if it is set? Why not have both of these features and if >> users >> > want purely health checking driven updates they can set this value to 0 >> and >> > enable health checks. If they want to have both health checking and time >> > driven updates they can set this to value to the time that they care >> about. >> > If they just want time driven updates they can disable health checking >> and >> > set this value. >> > >> > Then there is no coupling between the executor and the scheduler except >> > for status updates and there is no dependency on the `message` field of >> the >> > status update. >> > >> > We could even treat `watch_secs` as minimum time in STARTING + RUNNING >> > instead of RUNNING with this change and it becomes the lower bound in the >> > update transition speed. This can ensure that users don't deploy "too >> fast" >> > and end up overwhelming other services if they are deployed too quickly. >> > >> > >> > >> >> >> >> >> >> On Fri, Sep 2, 2016 at 2:26 PM, Zameer Manji <zma...@apache.org> wrote: >> >> > *cc: Renan* >> >> > >> >> > I think there is some disagreement/discussion on the review because we >> >> have >> >> > not achieved consensus on the design. Since the design doc was >> written, >> >> > Aurora adopted multiple executor support as well non HTTP based >> >> > healthchecking. This invalidates some parts of the original design. I >> >> think >> >> > all of the solutions here are possible amendments to the design doc. >> >> > >> >> > I am not in favor of Solution 2 at all because status updates between >> >> > executor <-> agent <-> master <-> scheduler are designed to update the >> >> > framework of updates to the task and not really designed to send >> >> arbitrary >> >> > information. Just because the Mesos API provides us with a string >> field >> >> > doesn't mean we should try to pack in arbitrary data. Also, it isn't >> >> clear >> >> > what other capabilities we might add in the future so I'm unconvinced >> >> that >> >> > capabilities needs to exist at all. My fear is that we will create the >> >> > infrastructure for capabilities just to serve this need and nothing >> >> else. >> >> > >> >> > I object to Solution 1 along the same lines. The Mesos API isn't >> >> designed >> >> > for packing arbitrary data in the status update message and I don't >> >> think >> >> > we should abuse that and rely on that. Also our current infrastructure >> >> just >> >> > plumbs the message to the UI and I think displaying capabilities is >> not >> >> > something we should do. >> >> > >> >> > I am in favor of Solution 3 which is as close as possible to the >> >> original >> >> > design in the design doc. The design doc says the following: >> >> > >> >> > Scheduler updater will skip the minWaitInInstanceMs (aka watch_secs >> >> >> <https://github.com/apache/aurora/blob/4b43305b33cd8bebdd802 >> >> 25a3987b7cc7a8389a2/docs/configuration-reference.md# >> updateconfig-objects >> >> >) >> >> >> grace period any time it detects a named port ‘health’ in task >> >> >> configuration. A RUNNING instance status will signify the end of >> >> instance >> >> >> update. >> >> > >> >> > >> >> > Instead of detecting the 'health' port in the task configuration, we >> >> make >> >> > enabling this feature explicitly by enabling a bit in the task >> >> > configuration with the `executorDrivenUpdates` bit. >> >> > >> >> > I understand this option makes this feature more complex because it >> >> > requires a schema change and requires operators to deploy the executor >> >> to >> >> > all agents before upgrading the client. However, I think that's a one >> >> time >> >> > operational cost as a opposed to long lived design choices that will >> >> affect >> >> > the code. >> >> > >> >> > Further Solution 3 is the most amenable to custom executors and >> >> continues >> >> > our tradition of treating executors as opaque black boxes. I think >> >> there is >> >> > a lot of value in treating executors as black boxes as it leaves the >> >> door >> >> > open to switching our executor to something else and doesn't impose a >> >> > burden to others that want to write their own. >> >> > >> >> > Alternatively, if amending the schema is too much work, I would be >> open >> >> to >> >> > just saying that scheduler version 0.16 (or 0.17) just assumes the >> >> executor >> >> > transitions to RUNNING once a task is healthy and dropping >> `watch_secs` >> >> > entirely. We can put it in the release notes that operators must >> deploy >> >> the >> >> > executor to 100% before deploying the scheduler. >> >> > >> >> > >> >> > On Thu, Sep 1, 2016 at 6:40 PM, 黄 凯 <texasred2...@hotmail.com> wrote: >> >> > >> >> >> Hi Folks, >> >> >> >> >> >> I'm currently working on a feature on aurora scheduler and executor. >> >> The >> >> >> implementation strategy became controversial on the review board, so >> I >> >> was >> >> >> wondering if I should broadcast it to more audience and initiate a >> >> >> discussion. Please feel free to let me know your thoughts, your help >> is >> >> >> greatly appreciated! >> >> >> >> >> >> The high level goal of this feature is to improve reliability and >> >> >> performance of the Aurora scheduler job updater, by relying on health >> >> check >> >> >> status rather than watch_secs timeout when deciding an individual >> >> instance >> >> >> update state. >> >> >> >> >> >> Please see the original review request * >> https://reviews.apache.org/r/ >> >> 51536/ >> >> >> <https://reviews.apache.org/r/51536/> * >> >> >> aurora JIRA ticket *https://issues.apache.org/jira/browse/AURORA-894 >> >> >> <https://issues.apache.org/jira/browse/AURORA-894>* >> >> >> design doc *https://docs.google.com/docum >> >> ent/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit# >> >> >> <https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm >> >> 10NXSxEWR0a-21FP5d94/edit#>* >> >> >> for more details and background. >> >> >> >> >> >> Note: The design doc becomes a little bit outdated on the "scheduler >> >> >> change summary" part (this is what the review request trying to >> >> address). >> >> >> As a result, I've left some comment to clarify the latest proposed >> >> >> implementation plan for scheduler change. >> >> >> >> >> >> There are two questions I'm trying to address here: >> >> >> *1. How does the scheduler infer the executor version and be backward >> >> >> compatible?* >> >> >> *2. Where do we determine if health check is enabled?* >> >> >> >> >> >> In short, there are 3 different solutions proposed on the review >> board. >> >> >> >> >> >> In the first two approaches, the scheduler will rely on a string to >> >> >> determine the executor version. We determine whether health check is >> >> >> enabled merely on executor side. There will be communication between >> >> the >> >> >> executor and the scheduler. >> >> >> *Solution 1: * >> >> >> *vCurrent executor sends a message in its health check thread during >> >> >> RUNNING state transition, and the vCurrent updater will infer the >> >> executor >> >> >> version from the presence of this message, and skip the watch_secs if >> >> >> necessary.* >> >> >> >> >> >> *Solution 2:* >> >> >> *Instead of relying on the presence of an arbitrary string in the >> >> message, >> >> >> rely on the presence of a string like: >> >> >> "capabilities:CAPABILITY_1,CAPABILITY-2" where CAPABILITY_1 and >> >> >> CAPABILITY_2 (etc.) are constants defined in api.thrift. Basically >> just >> >> >> formalizing the mechanism and making it a bit more future proof.* >> >> >> >> >> >> In the third solution, the scheduler infers the executor version from >> >> the >> >> >> JobUpdateSettings on scheduler side. >> >> >> *Solution 3:* >> >> >> *Adding a bit to JobUpdateSettings which is ‘executorDrivenUpdates', >> if >> >> >> that is set, the scheduler assumes that the transition from STARTING >> -> >> >> >> RUNNING makes the executor healthy and concurrently, we release >> >> thermos and >> >> >> change HealthCheckConfig to say that it should only go to running >> after >> >> >> healthy*. >> >> >> >> >> >> *Pros and Cons:* >> >> >> The main benefit of Solution 1 is: >> >> >> 1. By using the message in task status update, we don't have to make >> >> any >> >> >> schema change, which makes the design simple. >> >> >> 2. The feature is fully backward-compatible. When we roll out the >> >> vCurrent >> >> >> schedulers and executors, we do not have to instruct the users to >> >> provide >> >> >> additional field in the Job or Update configs, which could confuses >> >> >> customers when the vPrev and vCurrent executor coexist in the >> cluster. >> >> >> >> >> >> Concerns: >> >> >> Relying on the presence of a message makes things brittle. Also we do >> >> not >> >> >> want to expose this message to users. >> >> >> >> >> >> The benefit of Solution 2 is making the feature more future proof. >> >> >> However, if we do not envision a new executor feature in the short >> >> term, >> >> >> it's not too much different from Solution 1. >> >> >> >> >> >> The benefits of Solution 3 include: >> >> >> 1. We support more than just thermos now (and others rely on custom >> >> >> executors). >> >> >> 2. A lot of things in Aurora treat the executor as opaque. The status >> >> >> update message sent by executor should not be visible to users only >> if >> >> it's >> >> >> an error message. >> >> >> >> >> >> Concerns: >> >> >> 1. In addition to the ‘executorDrivenUpdates' bit that identifies the >> >> >> executor version, we still need to notify the scheduler if health >> >> check is >> >> >> enabled on vCurrent executor, if not, the scheduler must be able to >> >> fall >> >> >> back to use watch_secs. >> >> >> 2. The users have to provide an additional field in their .aurora >> >> config >> >> >> files. The feature wouldn't be available unless new clients are >> rolled >> >> out >> >> >> as well. >> >> >> >> >> >> Please let me know if I understand your suggestions correctly and >> >> >> hopefully everyone is on the same page! >> >> >> >> >> >> Thanks, >> >> >> >> >> >> Kai >> >> >> >> >> >> > >> > >> > >> > -- >> > Zameer Manji >> > >> >> >> >> -- >> Zameer Manji >>