IMO the alpha / beta / GA terminology makes sense, and makes things clearer to users, which is good.
Some thoughts on the specifics of your proposal: - You're suggesting we commit to a specific number of releases that a GA feature will be forward / backward compatible for. IMO, our current commitment (one major release) is okay, but it would be good to strive for doing this as infrequently as possible. In the future, we may decide to do major releases less often, which will naturally lengthen the commitment times. - I like the idea of phasing in the testing bar as features move from alpha -> beta -> GA. I think it'd be good to point to examples of features where the testing is done "right" for each stage. It should help contributors know what to shoot for. - Plenty of GA features today do not meet the testing bar you've mentioned, including some "day 1" features. This is fine — it is a natural consequence of raising the testing bar over time — but we should have an idea of what we want to do about this. One possible approach is to require that tests be added to meet the bar when fixes or changes are made to the feature. But this leads to situations where a small change can't be made without adding a mountain of tests. IMO it'd be good to do an amount of new testing commensurate with the scope of the change. A big refactor to a feature that doesn't have much testing should involve adding a mountain of tests to it. But we don't necessarily need to require that for a small bug fix or enhancement (but it would be great, of course!). - For "beta" the definition you suggest is all negative ("not battle tested", "may change", "may not be compatible"). We should include something positive as well, to illustrate what makes beta better than alpha. How about "no major known issues" or "no major API changes planned"? - I would suggest moving the "appropriate user-facing documentation" requirement to beta rather than GA. In order to have a useful beta testing period, we need to have good user-facing docs so people can try the feature out. - I think we might want to leave some alpha features undocumented, if their quality or stability level is so low that they won't be useful to people that aren't developers. The goal would be to avoid clogging up the user-facing docs with a bunch of half-baked stuff. Too much of that lowers the perceived quality level of the project. Now, thinking about specific features, I suggest we classify the current experimental features in the following way: - Java 11 support: Beta or GA (depending on how good the test coverage is) - HTTP remote task runner: Alpha (there aren't integration tests yet) - Router process: GA - Indexer process: Alpha or Beta (also depending on how good the test coverage is) - Segment locking / minor compaction: Alpha - Approximate histograms: GA, but deprecated (they are stable and have plenty of tests, but users should consider switching to DataSketches quantiles) - Lookups: Beta - Kinesis ingestion: GA (now that there are integration tests: https://github.com/apache/druid/pull/9724) - Materialized view extension: Alpha - Moments sketch extension: Alpha On Mon, Jun 8, 2020 at 1:49 PM Suneet Saldanha <suneet.salda...@imply.io> wrote: > Hi Druid devs! > > I've been thinking about our release process and would love to get your > thoughts on how we manage new features. > > When a new feature is added is it first marked as experimental? > How do users know which features are experimental? > How do we ensure that features do not break with each new release? > Should the release manager manually check each feature works as part of the > release process? > This doesn't seem like it can scale. > Should integration tests always be required if the feature is being added > to core? > > To address these issues, I'd like to propose we introduce a feature > lifecycle for all features so that we can set expectations for users > appropriately - either in the docs, product or both. I'd like to propose > something like this: > * Alpha - Known major bugs / performance issues. Incomplete functionality. > Disabled by default. > * Beta - Feature is not yet battle tested in production. API and > compatibility may change in the future. May not be forward/ backward > compatible. > * GA - Feature has appropriate user facing documentation and testing so > that it won't regres with a version upgrade. Will be forward / backward > compatible for x releases (maybe 4? ~ 1 year) > > I think a model like this will allow us to continue to ship features > quickly while keeping the release quality bar high so that our users can > continue to rely on Druid without worrying about upgrade issues. > I understand that adding integration tests may not always make sense for > early / experimental features when we're uncertain of the API or the > broader use case we're trying to solve. This model would make it clear to > our users which features are still work in progress, and which ones they > can expect to remain stable for a longer time. > > Below is an example of how I think this model can be applied to a new > feature: > > This PR adds support for a new feature - > https://github.com/apache/druid/pull/9449 > > While it has been tested locally, there may be changes that enter Druid > before the 0.19 release that break this feature, or more likely - a > refactoring after 0.19 that breaks something in this feature. In this > example, I think the feature should be marked as alpha, since there are > future changes expected to the functionality. At this stage integration > tests are not expected. Once the feature is complete, there should be happy > path integration tests for the feature and it can graduate to Beta. After > it has been running in production for a while, the feature can graduate to > GA once we've added enough integration tests that we feel confident that > the feature will continue to work if the integration tests run > successfully. > > I know this is a very long email, but I look forward to hearing your > thoughts on this. > Suneet >