In a recent discussion on the pains of upgrading one topic that came up is a feature that Riak had called Capabilities [1]. A major pain with upgrades is that each node independently decides when to start using new or modified functionality. Even when we put this behind a config (like storage compatibility mode) each node immediately enables the feature when the config is changed and the node is restarted. This causes various types of upgrade pain such as failed streams and schema disagreement. A recent example of this is CASSANRA-20118 [2]. In some cases operators can prevent this from happening through careful coordination (e.g. ensuring upgrade sstables only runs after the whole cluster is upgraded) but typically requires custom code in whatever control plane the operator is using. A capabilities framework would distribute the state of what features each node has (and their status e.g. enabled or not) so that the cluster can choose to opt in to new features once the whole cluster has them available. From experience, having this in Riak made upgrades a significantly less risky process and also paved a path towards repeatable downgrades. I think Cassandra would benefit from it as well.
Further, other tools like analytics could benefit from having this information since currently it's up to the operator to manually determine the state of the cluster in some cases. I am considering drafting a CEP proposal for this feature but wanted to take the general temperature of the community and get some early thoughts while working on the draft. Looking forward to hearing y'alls thoughts, Jordan [1] https://github.com/basho/riak_core/blob/25d9a6fa917eb8a2e95795d64eb88d7ad384ed88/src/riak_core_capability.erl#L23-L72 [2] https://issues.apache.org/jira/browse/CASSANDRA-20118