Re: [DISCUSS] CEP-42: Constraints Framework

Štefan Miklošovič Tue, 11 Jun 2024 06:29:46 -0700

Hi Bernardo,

1) Could you elaborate on these two constraints?


== and != ?

What is the use case? Why would I want to have data in a database stored in
some column which would need to be _same as my constraint_ and which _could
not_ be same as my constraint? Can you give me at least one example of
each? It looks like I am going to put a constant into a database in case of
==, wouldn't a static column be better?

2) For examples of text based types you mentioned: "is part of an enum" -
how would you enforce this in Cassandra? What enum do we have in CQL?
3) What does "is it block listed" mean?

In the meanwhile, I made changes to CEP-24 to move transactionality into
optional features.

On Tue, Jun 11, 2024 at 12:18 AM Bernardo Botella <
conta...@bernardobotella.com> wrote:

> Hi everyone,
>
> After the feedback, I'd like to make a recap of what we have discussed in
> this thread and try to move forward with the conversation.
>
> I made some clarifications:
> - Constraints are only applied at write time.
> - Guardrail configurations should maintain preference over what's being
> defined as a constraint.
>
> *Specify constraints:*
> There is a general feedback around adding more concrete examples than the
> ones that can be found on the CEP document.
> Basically, the initial constraints I am proposing are:
> - SizeOf Constraint for String types, as in
> name text CONSTRAINT sizeOf(name) < 256
>
> - Value Constraint for numeric types
> number_of_items int CONSTRAINT number_of_items < 1000
>
> Those two alone and combined provide a lot of flexibility, and allow
> complex validations that enable "new types" such as:
>
> CREATE TYPE keyspace.cidr_address_ipv4 (
>   ip_adress inet,
>   subnet_mask int,
>   CONSTRAINT subnet_mask > 0,
>   CONSTRAINT subnet_mask < 32
> )
>
> CREATE TYPE keyspace.color (
>   r int,
>   g int,
>   b int,
>   CONSTRAINT r >= 0,
>   CONSTRAINT r < 255,
>   CONSTRAINT g >= 0,
>   CONSTRAINT g < 255,
>   CONSTRAINT b >= 0,
>   CONSTRAINT b < 255,
> )
>
>
> Those two initial Constraints are de fundamental constraints that would
> give value to the feature. The framework can (and will) be extended with
> other Constraints, leaving us with the following:
>
> For numeric types:
> - Max (<)
> - Min (>)
> - Equality ( = = )
> - Difference (!=)
>
> For date types:
> - Before (<)
> - After (>)
>
> For text based types:
> - Size (sizeOf)
> - isJson (is the text a json?)
> - complies with a given pattern
> - Is it block listed?
> - Is it part of an enum?
>
> General table constraints (including more than one column):
> - Compare between numeric types (a < b, a > b, a != b, …)
> - Compare between date types (date1 < date2, date1>date2, date1!=date2, …)
>
> I have updated the CEP with this information.
>
> *Potential dependency on CEP-24:*
> Giving that the Constraints Framework provides a set of checks to be
> performed along side those that can be made using the Guardrails framework,
> there may be some relation with CEP-24, which mentions transactional
> Guardrails to prevent situation in which the limit configurations are
> different across the cluster.
>
> This CEP-42 is not proposing modifying the Guardrails framework, and
> therefore should not be affected by CEP-24. It is true that the
> improvements provided by CEP-24 would benefit this Constraints framework,
> but it is not dependent on them.
>
>
> I hope I included all the points and addressed them on the CEP, otherwise,
> please call it out and I’ll be more than happy to include it.
>
> Thanks everyone for all the inputs!
> Bernardo
>
> On Jun 7, 2024, at 11:54 AM, Štefan Miklošovič <
> stefan.mikloso...@gmail.com> wrote:
>
> How I see it is that in 5.1 there will be TCM for the very first time and
> I do not think that config in TCM would make it into 5.1 based on what Sam
> talks about (need for some stability etc), that makes total sense to me.
> TCM is quite a big feature to deliver on its own and putting even way more
> stuff into that might be detrimental to the quality if we rush it.
>
> Then sometimes after 5.1 we might take a serious look for config in TCM
> itself.
>
> My plan, ideally, is to still ship CEP-24 without config in TCM, then
> after 5.1 when config in TCM lands, CEP-24 might integrate with that on a
> deeper level.
>
> If CEP-42 (this one) makes it into 5.1 as well, I think the similar case
> might be done about that as well (integration with guardrails).
>
> On Fri, Jun 7, 2024 at 8:49 PM Sam Tunnicliffe <s...@beobal.com> wrote:
>
>> We've been working on a draft CEP for migrating config from yaml to
>> cluster metadata but have been a bit short of time recently, I'll try to
>> get something out for discussion as soon as possible.
>> A little delay isn't such a bad thing IMO, as we're still ironing out the
>> kinks in the TCM implementation itself. It'd be good to get a bit more road
>> testing done with that before we start adding more to it, which I'm sure
>> will start to ramp up once 5.0 is out.
>>
>> Thanks,
>> Sam
>>
>> On 7 Jun 2024, at 19:19, Štefan Miklošovič <stefan.mikloso...@gmail.com>
>> wrote:
>>
>> Yes, all configuration should be transactional (configuration which makes
>> sense to require to be the same cluster-wide). Guardrails in TCM are just a
>> subset of this problem. When I started to do CEP-24 I started with
>> guardrails in TCM but then I realized it leads to more general "all config
>> in TCM" and I found myself rabbit-hole-ing endlessly.
>>
>> BTW I do not think that once CEP-24 is in place without guardrails in TCM
>> then implementing it would blow up things a lot. It is really just about a
>> couple mutable virtual tables and a couple transformations for various
>> guardrail types we have but I expect that its integration into more general
>> config in TCM should be rather straightforward.
>>
>> Config in TCM definitely deserves its own CEP, it is too much to handle
>> under CEP-24 and CEP-24 can go without it already. It just put a little bit
>> more configuration acumen to nail it down correctly.
>>
>> Regards
>>
>> On Fri, Jun 7, 2024 at 8:12 PM Doug Rohrer <droh...@apple.com> wrote:
>>
>>> There’s a difference between the two though. Constraints are part of the
>>> table schema, and (independent of the interaction with Guardrails), have no
>>> dependency on yaml files being perfectly in sync across the cluster.
>>> Therefore, the feature (Constraints) on its own doesn’t depend on
>>> configuration files to be correct in its own right. The only place where
>>> this isn’t true is it’s interaction with Guardrails, which happen to be
>>> yaml-file based and cause issues.
>>>
>>> CEP-24’s password length requirements, however, is intended to be
>>> implemented *by adding a new guardrail*, which is totally dependent on
>>> YAML files today (and thus the concerns around a single misconfigured
>>> server allowing someone to use an insecure password). If CEP-24 fixes
>>> guardrails’ dependence on yaml files, it would *also* fix the
>>> problematic interaction between guardrails and constraints.
>>>
>>> I agree that it would be incredibly valuable to find a solution to the
>>> “yaml files need to be correct everywhere or something breaks” problem, and
>>> I think CEP-24, being security-focused, is more likely to be problematic
>>> without a solution to this issue. That said, I think Dinesh is right in
>>> that, at the end of the day, CEP-24 could be implemented without fixing the
>>> yaml config issue.
>>>
>>> I do wonder if the “Guardrails should be transactional” should really be
>>> “configuration should be transactional”, or at least as much config as
>>> possible should be, but that would blow up CEP-24 fairly dramatically
>>> (maybe?). Maybe “cluster-wide configuration should be read from a
>>> distributed source on startup/joining the cluster” or something would make
>>> sense, so the yaml file works as the source of truth on startup, but as
>>> soon as possible it’s read from a TCM-backed data source, and anything the
>>> node can get from other nodes it would… but now I’m designing a different
>>> CEP in a discuss thread, which is probably a bad idea...
>>>
>>> Regardless, I hope that I’m explaining why I see a difference between
>>> constraints and guardrails, and why I think it makes sense that constraints
>>> can move forward without a solution the misconfiguration problem where I
>>> also think you were right in calling it out in CEP-24 (even if we
>>> eventually move forward on CEP-24 without the solution in place).
>>>
>>> Doug
>>>
>>>
>>>
>>> On Jun 7, 2024, at 1:51 AM, Dinesh Joshi <djo...@apache.org> wrote:
>>>
>>> On Thu, Jun 6, 2024 at 1:03 PM Štefan Miklošovič <
>>> stefan.mikloso...@gmail.com> wrote:
>>>
>>>> It is interesting to see this feedback. When I look at CEP-24 where I
>>>> am obsessing about a user being able to misconfigure the password
>>>> validation strength so if a user hits a "weak" node then she would be able
>>>> to bypass it, and I see what is our approach here, then I am not sure what
>>>> I was waiting so long for and I should probably be just more aggressive
>>>> with the CEP and all the "caveats" could be just overlooked and deferred to
>>>> "sometimes later".
>>>>
>>>
>>> Stefan, unfortunately I didn't participate in the CEP-24 DISCUSS thread.
>>> Had I paid attention I would have suggested waiting on TCM doesn't make
>>> the feature any different. The feature is less likely to be misconfigured
>>> in a cluster. CEP-24 is valuable and password compliance with policies is a
>>> super useful feature which IMO shouldn't have been held back due to lack of
>>> TCM.
>>>
>>>
>>>
>>>
>>
>

Re: [DISCUSS] CEP-42: Constraints Framework

Reply via email to