One minor clarification: ETS is entirely in memory (unless you explicitly dump it to disk or use DETS) so the equivalence to a local system table is only partially accurate but I think the parallel is fine in the case of what I was describing.
Jordan On Fri, Dec 20, 2024 at 09:07 Jordan West <jorda...@gmail.com> wrote: > Benedict, I agree with you TCM might be overkill for capabilities. It’s > truly something that’s fine to be eventually consistent. Riaks > implementation used a local ETS table (ETS is built into Erlang - > equivalent for us would a local only system table) and an efficient and > reliable gossip protocol. The data was a simple CRDT basically (a > map<string, list<string>> basically of support features in preference order > with the only operations being additions and reads). > > So i agree with you that we could be using TCM as a hammer for every nail > here. But im also hestitant to introduce something new. Distributed tables, > or a virtual table with some way to aggregate accross the cluster, would > also work. In either case we would need a local cache (like Denylist). > > From a requirements perspective reads need to be local (because they may > be done in a hot path) but writes can be slow (typically only change on > start up or during operator intervention). > > Jordan > > > > On Fri, Dec 20, 2024 at 01:53 Benedict <bened...@apache.org> wrote: > >> If you perform a read from a distributed table on startup you will find >> the latest information. What catchup are you thinking of? I don’t think any >> of the features we talked about need a log, only the latest information. >> >> We can (and should) probably introduce event listeners for distributed >> tables, as this is also a really great feature, but I don’t think this >> should be necessary here. >> >> Regarding disagreements: if you use LWTs then there are no consistency >> issues to worry about. >> >> Again, I’m not opposed to using TCM, although I am a little worried TCM >> is becoming our new hammer with everything a nail. It would be better IMO >> to keep TCM scoped to essential functionality as it’s critical to >> correctness. Perhaps we could extend its APIs to less critical services >> without intertwining them with membership, schema and epoch handling. >> >> On 20 Dec 2024, at 09:43, Štefan Miklošovič <smikloso...@apache.org> >> wrote: >> >> >> >> I find TCM way more comfortable to work with. The capability of log being >> replayed on restart and catching up with everything else automatically is >> god-sent. If we had that on "good old distributed tables", then is it not >> true that we would need to take extra care of that, e.g. we would need to >> repair it etc ... It might be the source of the discrepancies / >> disagreements etc. TCM is just "maintenance-free" and _just works_. >> >> I think I was also investigating distributed tables but was just pulled >> towards TCM naturally because of its goodies. >> >> On Fri, Dec 20, 2024 at 10:08 AM Benedict <bened...@apache.org> wrote: >> >>> TCM is a perfectly valid basis for this, but TCM is only really >>> *necessary* to solve meta config problems where we can’t rely on the rest >>> of the database working. Particularly placement issues, which is why schema >>> and membership need to live there. >>> >>> It should be possible to use distributed system tables just fine for >>> capabilities, config and guardrails. >>> >>> That said, it’s possible config might be better represented as part of >>> the schema (and we already store some relevant config there) in which case >>> it would live in TCM automatically. Migrating existing configs to a >>> distributed setup will be fun however we do it though. >>> >>> Capabilities also feel naturally related to other membership >>> information, so TCM might be the most suitable place, particularly for >>> handling downgrades after capabilities have been enabled (if we ever expect >>> to support turning off capabilities and then downgrading - which today we >>> mostly don’t). >>> >>> On 20 Dec 2024, at 08:42, Štefan Miklošovič <smikloso...@apache.org> >>> wrote: >>> >>> >>> Jordan, >>> >>> I also think that having it on TCM would be ideal and we should explore >>> this path first before doing anything custom. >>> >>> Regarding my idea about the guardrails in TCM, when I prototyped that >>> and wanted to make it happen, there was a little bit of a pushback (1) >>> (even though super reasonable one) that TCM is just too young at the moment >>> and it would be desirable to go through some stabilisation period. >>> >>> Another idea was that we should not make just guardrails happen but the >>> whole config should be in TCM. From what I put together, Sam / Alex does >>> not seem to be opposed to this idea, rather the opposite, but having CEP >>> about that is way more involved than having just guardrails there. I >>> consider guardrails to be kind of special and I do not think that having >>> all configurations in TCM (which guardrails are part of) is the absolute >>> must in order to deliver that. I may start with guardrails CEP and you may >>> explore Capabilities CEP on TCM too, if that makes sense? >>> >>> I just wanted to raise the point about the time this would be delivered. >>> If Capabilities are built on TCM and I wanted to do Guardrails on TCM too >>> but was explained it is probably too soon, I guess you would experience >>> something similar. >>> >>> Sam's comment is from May and maybe a lot has changed since in then and >>> his comment is not applicable anymore. It would be great to know if we >>> could build on top of the current trunk already or we will wait until >>> 5.1/6.0 is delivered. >>> >>> (1) >>> https://issues.apache.org/jira/browse/CASSANDRA-19593?focusedCommentId=17844326&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17844326 >>> >>> On Fri, Dec 20, 2024 at 2:17 AM Jordan West <jorda...@gmail.com> wrote: >>> >>>> Firstly, glad to see the support and enthusiasm here and in the recent >>>> Slack discussion. I think there is enough for me to start drafting a CEP. >>>> >>>> Stefan, global configuration and capabilities do have some overlap but >>>> not full overlap. For example, you may want to set globally that a cluster >>>> enables feature X or control the threshold for a guardrail but you still >>>> need to know if all nodes support feature X or have that guardrail, the >>>> latter is what capabilities targets. I do think capabilities are a step >>>> towards supporting global configuration and the work you described is >>>> another step (that we could do after capabilities or in parallel with them >>>> in mind). I am also supportive of exploring global configuration for the >>>> reasons you mentioned. >>>> >>>> In terms of how capabilities get propagated across the cluster, I >>>> hadn't put much thought into it yet past likely TCM since this will be a >>>> new feature that lands after TCM. In Riak, we had gossip (but more mature >>>> than C*s -- this was an area I contributed to a lot so very familiar) to >>>> disseminate less critical information such as capabilities and a separate >>>> layer that did TCM. Since we don't have this in C* I don't think we would >>>> want to build a separate distribution channel for capabilities metadata >>>> when we already have TCM in place. But I plan to explore this more as I >>>> draft the CEP. >>>> >>>> Jordan >>>> >>>> On Thu, Dec 19, 2024 at 1:48 PM Štefan Miklošovič < >>>> smikloso...@apache.org> wrote: >>>> >>>>> Hi Jordan, >>>>> >>>>> what would this look like from the implementation perspective? I was >>>>> experimenting with transactional guardrails where an operator would >>>>> control >>>>> the content of a virtual table which would be backed by TCM so whatever >>>>> guardrail we would change, this would be automatically and transparently >>>>> propagated to every node in a cluster. The POC worked quite nicely. TCM is >>>>> just a vehicle to commit a change which would spread around and all these >>>>> settings would survive restarts. We would have the same configuration >>>>> everywhere which is not currently the case because guardrails are >>>>> configured per node and if not persisted to yaml, on restart their values >>>>> would be forgotten. >>>>> >>>>> Guardrails are just an example, what is quite obvious is to expand >>>>> this idea to the whole configuration in yaml. Of course, not all >>>>> properties >>>>> in yaml make sense to be the same cluster-wise (ip addresses etc ...), but >>>>> the ones which do would be again set everywhere the same way. >>>>> >>>>> The approach I described above is that we make sure that the >>>>> configuration is same everywhere, hence there can be no misunderstanding >>>>> what features this or that node has, if we say that all nodes have to have >>>>> a particular feature because we said so in TCM log so on restart / replay, >>>>> a node with "catch up" with whatever features it is asked to turn on. >>>>> >>>>> Your approach seems to be that we distribute what all capabilities / >>>>> features a cluster supports and that each individual node configures >>>>> itself >>>>> in some way or not to comply? >>>>> >>>>> Is there any intersection in these approaches? At first sight it seems >>>>> somehow related. How is one different from another from your point of >>>>> view? >>>>> >>>>> Regards >>>>> >>>>> (1) https://issues.apache.org/jira/browse/CASSANDRA-19593 >>>>> >>>>> On Thu, Dec 19, 2024 at 12:00 AM Jordan West <jw...@apache.org> wrote: >>>>> >>>>>> In a recent discussion on the pains of upgrading one topic that came >>>>>> up is a feature that Riak had called Capabilities [1]. A major pain with >>>>>> upgrades is that each node independently decides when to start using new >>>>>> or >>>>>> modified functionality. Even when we put this behind a config (like >>>>>> storage >>>>>> compatibility mode) each node immediately enables the feature when the >>>>>> config is changed and the node is restarted. This causes various types of >>>>>> upgrade pain such as failed streams and schema disagreement. A >>>>>> recent example of this is CASSANRA-20118 [2]. In some cases operators can >>>>>> prevent this from happening through careful coordination (e.g. ensuring >>>>>> upgrade sstables only runs after the whole cluster is upgraded) but >>>>>> typically requires custom code in whatever control plane the operator is >>>>>> using. A capabilities framework would distribute the state of what >>>>>> features >>>>>> each node has (and their status e.g. enabled or not) so that the cluster >>>>>> can choose to opt in to new features once the whole cluster has them >>>>>> available. From experience, having this in Riak made upgrades a >>>>>> significantly less risky process and also paved a path towards repeatable >>>>>> downgrades. I think Cassandra would benefit from it as well. >>>>>> >>>>>> Further, other tools like analytics could benefit from having this >>>>>> information since currently it's up to the operator to manually determine >>>>>> the state of the cluster in some cases. >>>>>> >>>>>> I am considering drafting a CEP proposal for this feature but wanted >>>>>> to take the general temperature of the community and get some early >>>>>> thoughts while working on the draft. >>>>>> >>>>>> Looking forward to hearing y'alls thoughts, >>>>>> Jordan >>>>>> >>>>>> [1] >>>>>> https://github.com/basho/riak_core/blob/25d9a6fa917eb8a2e95795d64eb88d7ad384ed88/src/riak_core_capability.erl#L23-L72 >>>>>> >>>>>> [2] https://issues.apache.org/jira/browse/CASSANDRA-20118 >>>>>> >>>>>