Thanks @yan's great inputs! I couldn't agree more almost of them. > Also the API is not just what the machine reads but all the documentation associated with it, right? It depends on what the documentation says; what the user _should_ expect.
I think different users may have different expectations. And the guy who developed the APIs may have different understand from some users as well. Our documentations should cover most of cases. But in case that we didn't or forgot to write it explicitly in the document, should we give up to update the API? Just like user Alice said this is a BUG while user Bob said this is a feature. I think we still need to raise it case by case to ensure most users are not affected by the breaking API changes. On Sat, Oct 15, 2016 at 6:55 AM, Vinod Kone <vinodk...@apache.org> wrote: > We will chat about this in the upcoming community sync (thursday 3 PM). > So, please make sure to attend if you are interested. > > On Fri, Oct 14, 2016 at 3:44 PM, Yan Xu <xuj...@apple.com> wrote: > >> >> On Fri, Oct 14, 2016 at 3:37 PM, Yan Xu <xuj...@apple.com> wrote: >> >>> Thanks Alex for starting this! >>> >>> In addition to comments below, I think it'll be helpful to keep the >>> existing versioning doc concise and user-friendly while having a dedicated >>> doc for the "implementation details" where precise requirements and >>> procedures go. Maybe some duplication/cross-referencing is needed but Mesos >>> developers will find the latter much more helpful while the users/framework >>> developer will find the former easy to read. >>> >>> e.g., a similar split: >>> https://github.com/kubernetes/kubernetes/blob/master/docs/api.md >>> https://github.com/kubernetes/kubernetes/blob/master/docs/de >>> vel/api_changes.md (which has a lot of details on how the kubernetes >>> community is thinking about similar issues, which we can learn from) >>> >>> Jiang Yan Xu >>> >>> On Wed, Oct 12, 2016 at 9:34 AM, Alex Rukletsov <a...@mesosphere.com> >>> wrote: >>> >>>> Folks, >>>> >>>> There have been a bunch of online [1, 2] and offline discussions about >>>> our >>>> deprecation and versioning policy. I found that people—including >>>> myself—read the versioning doc [3] differently; moreover some aspects >>>> are >>>> not captured there. I would like to start a discussion around this >>>> topic by >>>> sharing my confusions and suggestions. This will hopefully help us stay >>>> on >>>> the same page and have similar expectations. The second goal is to >>>> eliminate ambiguities from the versioning doc (thanks Vinod for >>>> volunteering to update it). >>>> >>> >>> +1 Let me know if there are things I can help with. >>> >>> >>>> >>>> 1. API vs. semantic changes. >>>> Current versioning guide treat features (e.g. flags, metrics, endpoints) >>>> and API differently: incompatible changes for the former are allowed >>>> after >>>> 6 month deprecation cycle, while for the latter they require bumping a >>>> major version. I suggest we consolidate these policies. >>>> >>> >>> I feel that the distinction is not API vs. semantic changes, Backwards >>> compatible API guarantee should imply backwards compatible semantics (of >>> the API). >>> i.e., if a change in API doesn't cause the message to be dropped to the >>> floor but leads to behavior change that causes problems in the system, it >>> still breaks compatibility. >>> >>> IMO the distinction is more between: >>> - Compatibility between components that are impossible/very unpleasant >>> to upgrade in lockstep - high priority for compatibility guarantee. >>> - Compatibility between components that are generally bundled (modules) >>> or things that usually aren't built into automated tooling (e.g., the >>> /state endpoint) - more relaxed for now but we should explicitly exclude >>> them from the guarantee. >>> >>> >>>> >>>> We should also define and clearly explain what changes require bumping >>>> the >>>> major version. I have no strong opinion here and would love to hear what >>>> people think. The original motivation for maintaining backwards >>>> compatibility is to make sure vN schedulers can correctly work with vN >>>> API >>>> without being updated. But what about semantic changes that do not touch >>>> the API? For example, what if we decide to send less task health >>>> updates to >>>> schedulers based on some health policy? It influences the flow of task >>>> status updates, should such change be considered compatible? Taking it >>>> to >>>> an extreme, we may not even be able to fix some bugs because someone may >>>> already rely on this behaviour! >>>> >>> >>> API changes should warrant a major version bump. Also the API is not >>> just what the machine reads but all the documentation associated with it, >>> right? It depends on what the documentation says; what the user _should_ >>> expect. >>> >>> That said, I feel that these things are hard to be talked about in the >>> abstract. Even with a guideline, we still need to make case-by-case >>> decisions. (e.g., has the documentation precisely defined this precise >>> behavior? If not, is it reasonable for the users to expect some behavior >>> because it's common sense? How bad is it if some behavior just changes a >>> tiny bit?) Therefore we need to make sure the process for API changes are >>> more rigorously defined. >>> >>> Whether something is a bug depends on whether the API does what it says >>> it'll do. The line may sometimes be blurry but in general I don't feel it's >>> a problem. If someone is relying on the behavior that is a bug, we should >>> still help them fix it but the bug shouldn't count as "our guarantee". >>> >>> >>>> >>>> Another tightly related thing we should explicitly call out is >>>> upgradability and rollback capabilities inside a major release. >>>> Committing >>>> to this may significantly limit what we can change within a major >>>> release; >>>> on the other side it will give users more time and a better experience >>>> about using and maintaining Mesos clusters. >>>> >>> >>> According to the versioning doc upgradability depends on whether you >>> depend on deprecated/removed features. >>> >>> That paragraph should be explained more precisely: >>> - "deprecated" means your system won't break but warnings are shown >>> (Maybe we should use some standard deprecation warning keywords so the >>> operator can monitor the log for such warnings! >>> - "removed": means it may break. >>> >>> If you deprecate a flag/env that interface with operator tooling in the >>> next minor release, the operator basically has 6 months from the next minor >>> release to change the her tooling. I feel this is pretty acceptable. >>> If you deprecate a flag/env variable that interface with the framework >>> (executor) in the next minor release, I feel it may not be enough and it >>> probably warrants a major version bump. So perhaps the API shouldn't be >>> just the protos. >>> >>> >>>> 2. Versioned vs. unversioned protobufs. >>>> Currently we have v1 and unnamed protobufs, which simultaneously mean >>>> v0, >>>> v2, and internal. I am sometimes confused about what is the right way to >>>> update or introduce a field or message there, do people feel the same? >>>> How >>>> about splitting the unnamed version into explicit v0, v2, and internal? >>>> >>> >>> As haosdent mentioned, we have captured this in MESOS-6268. The benefit >>> is clear but I guess the people will be more motivated when we find some v2 >>> feature can't be made compatible with the v0 API. (Anand's point >>> in MESOS-6016). On the other hand, if we cut v0 API access before that >>> happens (is v0 API obsolete and should be removed 6 months after 1.0?) then >>> we don't need to worry about v0 and can use unversioned protos as >>> "internal"? >>> >>> >>>> Food for thought. It would be great if we can only maintain "diffs" to >>>> the >>>> internal protobufs in the code, instead of duplicating them altogether. >>>> >>>> 3. API and feature labelling. >>>> I suggest to introduce explicit labels for API and features, to ensure >>>> users have the right assumptions about the their lifetime while >>>> engineers >>>> have the ability to change a wip feature in an non-compatible way. I >>>> propose the following: >>>> API: stable, non-stable, pure (not used by Mesos components) >>>> Feature: experimental, normal. >>>> >>> >>> +1 on formalizing the terminologies. >>> >>> Historically the distinction is not clear for the following: >>> >>> 1. The API has no compatibility guarantee at all. >>> 2. The feature provided by this API is experimental >>> >> >> To add to this point: because 2) logically doesn't apply to the "pure >> (not used by Mesos components)" fields in the API, it could be more >> confusing and thus require more precise definition. >> >> >>> >>> IMO It's OK that we say that we don't distinguish the two (the API has >>> no compatibility guarantee until the feature is fully released) but we have >>> to make it clear. >>> If we don't make such distinction, ALL API additions should be marked as >>> unstable first and be changed stable later (as a formal process). >>> >>> >>>> >>>> Looking forward to your thoughts and suggestions. >>>> AlexR >>>> >>>> [1] https://www.mail-archive.com/user@mesos.apache.org/msg08025.html >>>> [2] https://www.mail-archive.com/dev@mesos.apache.org/msg36621.html >>>> [3] >>>> https://github.com/apache/mesos/blob/b2beef37f6f85a8c75e9681 >>>> 36caa7a1f292ba20e/docs/versioning.md >>>> >>> >>> >> > -- Best Regards, Haosdent Huang