Capabilities

Jordan West Wed, 18 Dec 2024 15:00:48 -0800

In a recent discussion on the pains of upgrading one topic that came up is
a feature that Riak had called Capabilities [1]. A major pain with upgrades
is that each node independently decides when to start using new or modified
functionality. Even when we put this behind a config (like storage
compatibility mode) each node immediately enables the feature when the
config is changed and the node is restarted. This causes various types of
upgrade pain such as failed streams and schema disagreement. A
recent example of this is CASSANRA-20118 [2]. In some cases operators can
prevent this from happening through careful coordination (e.g. ensuring
upgrade sstables only runs after the whole cluster is upgraded) but
typically requires custom code in whatever control plane the operator is
using. A capabilities framework would distribute the state of what features
each node has (and their status e.g. enabled or not) so that the cluster
can choose to opt in to new features once the whole cluster has them
available. From experience, having this in Riak made upgrades a
significantly less risky process and also paved a path towards repeatable
downgrades. I think Cassandra would benefit from it as well.


Further, other tools like analytics could benefit from having this
information since currently it's up to the operator to manually determine
the state of the cluster in some cases.

I am considering drafting a CEP proposal for this feature but wanted to
take the general temperature of the community and get some early thoughts
while working on the draft.

Looking forward to hearing y'alls thoughts,
Jordan

[1]
https://github.com/basho/riak_core/blob/25d9a6fa917eb8a2e95795d64eb88d7ad384ed88/src/riak_core_capability.erl#L23-L72

[2] https://issues.apache.org/jira/browse/CASSANDRA-20118

Capabilities

Reply via email to