One minor clarification: ETS is entirely in memory  (unless you explicitly
dump it to disk or use DETS) so the equivalence to a local system table is
only partially accurate but I think the parallel is fine in the case of
what I was describing.

Jordan

On Fri, Dec 20, 2024 at 09:07 Jordan West <jorda...@gmail.com> wrote:

> Benedict, I agree with you TCM might be overkill for capabilities. It’s
> truly something that’s fine to be eventually consistent. Riaks
> implementation used a local ETS table (ETS is built into Erlang -
> equivalent for us would a local only system table) and an efficient and
> reliable gossip protocol. The data was a simple CRDT basically (a
> map<string, list<string>> basically of support features in preference order
> with the only operations being additions and reads).
>
> So i agree with you that we could be using TCM as a hammer for every nail
> here. But im also hestitant to introduce something new. Distributed tables,
> or a virtual table with some way to aggregate accross the cluster, would
> also work. In either case we would need a local cache (like Denylist).
>
> From a requirements perspective reads need to be local (because they may
> be done in a hot path) but writes can be slow (typically only change on
> start up or during operator intervention).
>
> Jordan
>
>
>
> On Fri, Dec 20, 2024 at 01:53 Benedict <bened...@apache.org> wrote:
>
>> If you perform a read from a distributed table on startup you will find
>> the latest information. What catchup are you thinking of? I don’t think any
>> of the features we talked about need a log, only the latest information.
>>
>> We can (and should) probably introduce event listeners for distributed
>> tables, as this is also a really great feature, but I don’t think this
>> should be necessary here.
>>
>> Regarding disagreements: if you use LWTs then there are no consistency
>> issues to worry about.
>>
>> Again, I’m not opposed to using TCM, although I am a little worried TCM
>> is becoming our new hammer with everything a nail. It would be better IMO
>> to keep TCM scoped to essential functionality as it’s critical to
>> correctness. Perhaps we could extend its APIs to less critical services
>> without intertwining them with membership, schema and epoch handling.
>>
>> On 20 Dec 2024, at 09:43, Štefan Miklošovič <smikloso...@apache.org>
>> wrote:
>>
>> 
>>
>> I find TCM way more comfortable to work with. The capability of log being
>> replayed on restart and catching up with everything else automatically is
>> god-sent. If we had that on "good old distributed tables", then is it not
>> true that we would need to take extra care of that, e.g. we would need to
>> repair it etc ... It might be the source of the discrepancies /
>> disagreements etc. TCM is just "maintenance-free" and _just works_.
>>
>> I think I was also investigating distributed tables but was just pulled
>> towards TCM naturally because of its goodies.
>>
>> On Fri, Dec 20, 2024 at 10:08 AM Benedict <bened...@apache.org> wrote:
>>
>>> TCM is a perfectly valid basis for this, but TCM is only really
>>> *necessary* to solve meta config problems where we can’t rely on the rest
>>> of the database working. Particularly placement issues, which is why schema
>>> and membership need to live there.
>>>
>>> It should be possible to use distributed system tables just fine for
>>> capabilities, config and guardrails.
>>>
>>> That said, it’s possible config might be better represented as part of
>>> the schema (and we already store some relevant config there) in which case
>>> it would live in TCM automatically. Migrating existing configs to a
>>> distributed setup will be fun however we do it though.
>>>
>>> Capabilities also feel naturally related to other membership
>>> information, so TCM might be the most suitable place, particularly for
>>> handling downgrades after capabilities have been enabled (if we ever expect
>>> to support turning off capabilities and then downgrading - which today we
>>> mostly don’t).
>>>
>>> On 20 Dec 2024, at 08:42, Štefan Miklošovič <smikloso...@apache.org>
>>> wrote:
>>>
>>> 
>>> Jordan,
>>>
>>> I also think that having it on TCM would be ideal and we should explore
>>> this path first before doing anything custom.
>>>
>>> Regarding my idea about the guardrails in TCM, when I prototyped that
>>> and wanted to make it happen, there was a little bit of a pushback (1)
>>> (even though super reasonable one) that TCM is just too young at the moment
>>> and it would be desirable to go through some stabilisation period.
>>>
>>> Another idea was that we should not make just guardrails happen but the
>>> whole config should be in TCM. From what I put together, Sam / Alex does
>>> not seem to be opposed to this idea, rather the opposite, but having CEP
>>> about that is way more involved than having just guardrails there. I
>>> consider guardrails to be kind of special and I do not think that having
>>> all configurations in TCM (which guardrails are part of) is the absolute
>>> must in order to deliver that. I may start with guardrails CEP and you may
>>> explore Capabilities CEP on TCM too, if that makes sense?
>>>
>>> I just wanted to raise the point about the time this would be delivered.
>>> If Capabilities are built on TCM and I wanted to do Guardrails on TCM too
>>> but was explained it is probably too soon, I guess you would experience
>>> something similar.
>>>
>>> Sam's comment is from May and maybe a lot has changed since in then and
>>> his comment is not applicable anymore. It would be great to know if we
>>> could build on top of the current trunk already or we will wait until
>>> 5.1/6.0 is delivered.
>>>
>>> (1)
>>> https://issues.apache.org/jira/browse/CASSANDRA-19593?focusedCommentId=17844326&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17844326
>>>
>>> On Fri, Dec 20, 2024 at 2:17 AM Jordan West <jorda...@gmail.com> wrote:
>>>
>>>> Firstly, glad to see the support and enthusiasm here and in the recent
>>>> Slack discussion. I think there is enough for me to start drafting a CEP.
>>>>
>>>> Stefan, global configuration and capabilities do have some overlap but
>>>> not full overlap. For example, you may want to set globally that a cluster
>>>> enables feature X or control the threshold for a guardrail but you still
>>>> need to know if all nodes support feature X or have that guardrail, the
>>>> latter is what capabilities targets. I do think capabilities are a step
>>>> towards supporting global configuration and the work you described is
>>>> another step (that we could do after capabilities or in parallel with them
>>>> in mind). I am also supportive of exploring global configuration for the
>>>> reasons you mentioned.
>>>>
>>>> In terms of how capabilities get propagated across the cluster, I
>>>> hadn't put much thought into it yet past likely TCM since this will be a
>>>> new feature that lands after TCM. In Riak, we had gossip (but more mature
>>>> than C*s -- this was an area I contributed to a lot so very familiar) to
>>>> disseminate less critical information such as capabilities and a separate
>>>> layer that did TCM. Since we don't have this in C* I don't think we would
>>>> want to build a separate distribution channel for capabilities metadata
>>>> when we already have TCM in place. But I plan to explore this more as I
>>>> draft the CEP.
>>>>
>>>> Jordan
>>>>
>>>> On Thu, Dec 19, 2024 at 1:48 PM Štefan Miklošovič <
>>>> smikloso...@apache.org> wrote:
>>>>
>>>>> Hi Jordan,
>>>>>
>>>>> what would this look like from the implementation perspective? I was
>>>>> experimenting with transactional guardrails where an operator would 
>>>>> control
>>>>> the content of a virtual table which would be backed by TCM so whatever
>>>>> guardrail we would change, this would be automatically and transparently
>>>>> propagated to every node in a cluster. The POC worked quite nicely. TCM is
>>>>> just a vehicle to commit a change which would spread around and all these
>>>>> settings would survive restarts. We would have the same configuration
>>>>> everywhere which is not currently the case because guardrails are
>>>>> configured per node and if not persisted to yaml, on restart their values
>>>>> would be forgotten.
>>>>>
>>>>> Guardrails are just an example, what is quite obvious is to expand
>>>>> this idea to the whole configuration in yaml. Of course, not all 
>>>>> properties
>>>>> in yaml make sense to be the same cluster-wise (ip addresses etc ...), but
>>>>> the ones which do would be again set everywhere the same way.
>>>>>
>>>>> The approach I described above is that we make sure that the
>>>>> configuration is same everywhere, hence there can be no misunderstanding
>>>>> what features this or that node has, if we say that all nodes have to have
>>>>> a particular feature because we said so in TCM log so on restart / replay,
>>>>> a node with "catch up" with whatever features it is asked to turn on.
>>>>>
>>>>> Your approach seems to be that we distribute what all capabilities /
>>>>> features a cluster supports and that each individual node configures 
>>>>> itself
>>>>> in some way or not to comply?
>>>>>
>>>>> Is there any intersection in these approaches? At first sight it seems
>>>>> somehow related. How is one different from another from your point of 
>>>>> view?
>>>>>
>>>>> Regards
>>>>>
>>>>> (1) https://issues.apache.org/jira/browse/CASSANDRA-19593
>>>>>
>>>>> On Thu, Dec 19, 2024 at 12:00 AM Jordan West <jw...@apache.org> wrote:
>>>>>
>>>>>> In a recent discussion on the pains of upgrading one topic that came
>>>>>> up is a feature that Riak had called Capabilities [1]. A major pain with
>>>>>> upgrades is that each node independently decides when to start using new 
>>>>>> or
>>>>>> modified functionality. Even when we put this behind a config (like 
>>>>>> storage
>>>>>> compatibility mode) each node immediately enables the feature when the
>>>>>> config is changed and the node is restarted. This causes various types of
>>>>>> upgrade pain such as failed streams and schema disagreement. A
>>>>>> recent example of this is CASSANRA-20118 [2]. In some cases operators can
>>>>>> prevent this from happening through careful coordination (e.g. ensuring
>>>>>> upgrade sstables only runs after the whole cluster is upgraded) but
>>>>>> typically requires custom code in whatever control plane the operator is
>>>>>> using. A capabilities framework would distribute the state of what 
>>>>>> features
>>>>>> each node has (and their status e.g. enabled or not) so that the cluster
>>>>>> can choose to opt in to new features once the whole cluster has them
>>>>>> available. From experience, having this in Riak made upgrades a
>>>>>> significantly less risky process and also paved a path towards repeatable
>>>>>> downgrades. I think Cassandra would benefit from it as well.
>>>>>>
>>>>>> Further, other tools like analytics could benefit from having this
>>>>>> information since currently it's up to the operator to manually determine
>>>>>> the state of the cluster in some cases.
>>>>>>
>>>>>> I am considering drafting a CEP proposal for this feature but wanted
>>>>>> to take the general temperature of the community and get some early
>>>>>> thoughts while working on the draft.
>>>>>>
>>>>>> Looking forward to hearing y'alls thoughts,
>>>>>> Jordan
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/basho/riak_core/blob/25d9a6fa917eb8a2e95795d64eb88d7ad384ed88/src/riak_core_capability.erl#L23-L72
>>>>>>
>>>>>> [2] https://issues.apache.org/jira/browse/CASSANDRA-20118
>>>>>>
>>>>>

Reply via email to