Re: [DISCUSS] CEP-42: Constraints Framework

Claude Warren, Jr via dev Wed, 12 Jun 2024 03:34:22 -0700

>
> 2)
> Is part of an enum is somehow suplying the lack of enum types. Constraint
> could be something like CONSTRAINT belongsToEnum([list of valid values],
> field):
> CREATE TABLE keyspace.table (
>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field),
>   ...
> );
> 3)
> Similarly, we can check and reject if a term is part of a list of blocked
> terms:
> CREATE TABLE keyspace.table (
>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'],
> field),
>   ...
> );



Are these not just "CONSTRAINT inList([List of valid values], field);"  and
"CONSTRAINT not inList([List of valid values], field);"?
At this point doesn't "CONSTRAINT p1 != p2" devolve to "CONSTRAINT not
inList([p1], p2);"?

Can "[List of values]" point to a variable containing a list?  Or does it
require hard coding in the constraint itself?



On Tue, Jun 11, 2024 at 6:23 PM Bernardo Botella <
conta...@bernardobotella.com> wrote:

> Hi Štephan
>
> I'll address the different points:
> 1)
> An example (possibly a stretch) of use case for != constraint would be:
> Let's say you have a table in which you want to record a movement, from
> position p1 to position p2. You may want to check that those two are
> different to make sure there is actual movement.
>
> CREATE TABLE keyspace.table (
>   p1 int,
>   p2 int,
>   ...,
>   CONSTRAINT p1 != p2
> );
>
> For the case of ==, I agree that it is harder to come up with a valid use
> case, and I added it for completion.
>
> 2)
> Is part of an enum is somehow suplying the lack of enum types. Constraint
> could be something like CONSTRAINT belongsToEnum([list of valid values],
> field):
> CREATE TABLE keyspace.table (
>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field),
>   ...
> );
>
> 3)
> Similarly, we can check and reject if a term is part of a list of blocked
> terms:
> CREATE TABLE keyspace.table (
>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'],
> field),
>   ...
> );
>
> Please let me know if this helps,
> Bernardo
>
>
>
> On Jun 11, 2024, at 6:29 AM, Štefan Miklošovič <
> stefan.mikloso...@gmail.com> wrote:
>
> Hi Bernardo,
>
> 1) Could you elaborate on these two constraints?
>
> == and != ?
>
> What is the use case? Why would I want to have data in a database stored
> in some column which would need to be _same as my constraint_ and which
> _could not_ be same as my constraint? Can you give me at least one example
> of each? It looks like I am going to put a constant into a database in case
> of ==, wouldn't a static column be better?
>
> 2) For examples of text based types you mentioned: "is part of an enum" -
> how would you enforce this in Cassandra? What enum do we have in CQL?
> 3) What does "is it block listed" mean?
>
> In the meanwhile, I made changes to CEP-24 to move transactionality into
> optional features.
>
> On Tue, Jun 11, 2024 at 12:18 AM Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
>> Hi everyone,
>>
>> After the feedback, I'd like to make a recap of what we have discussed in
>> this thread and try to move forward with the conversation.
>>
>> I made some clarifications:
>> - Constraints are only applied at write time.
>> - Guardrail configurations should maintain preference over what's being
>> defined as a constraint.
>>
>> *Specify constraints:*
>> There is a general feedback around adding more concrete examples than the
>> ones that can be found on the CEP document.
>> Basically, the initial constraints I am proposing are:
>> - SizeOf Constraint for String types, as in
>> name text CONSTRAINT sizeOf(name) < 256
>>
>> - Value Constraint for numeric types
>> number_of_items int CONSTRAINT number_of_items < 1000
>>
>> Those two alone and combined provide a lot of flexibility, and allow
>> complex validations that enable "new types" such as:
>>
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT subnet_mask > 0,
>>   CONSTRAINT subnet_mask < 32
>> )
>>
>> CREATE TYPE keyspace.color (
>>   r int,
>>   g int,
>>   b int,
>>   CONSTRAINT r >= 0,
>>   CONSTRAINT r < 255,
>>   CONSTRAINT g >= 0,
>>   CONSTRAINT g < 255,
>>   CONSTRAINT b >= 0,
>>   CONSTRAINT b < 255,
>> )
>>
>>
>> Those two initial Constraints are de fundamental constraints that would
>> give value to the feature. The framework can (and will) be extended with
>> other Constraints, leaving us with the following:
>>
>> For numeric types:
>> - Max (<)
>> - Min (>)
>> - Equality ( = = )
>> - Difference (!=)
>>
>> For date types:
>> - Before (<)
>> - After (>)
>>
>> For text based types:
>> - Size (sizeOf)
>> - isJson (is the text a json?)
>> - complies with a given pattern
>> - Is it block listed?
>> - Is it part of an enum?
>>
>> General table constraints (including more than one column):
>> - Compare between numeric types (a < b, a > b, a != b, …)
>> - Compare between date types (date1 < date2, date1>date2, date1!=date2, …)
>>
>> I have updated the CEP with this information.
>>
>> *Potential dependency on CEP-24:*
>> Giving that the Constraints Framework provides a set of checks to be
>> performed along side those that can be made using the Guardrails framework,
>> there may be some relation with CEP-24, which mentions transactional
>> Guardrails to prevent situation in which the limit configurations are
>> different across the cluster.
>>
>> This CEP-42 is not proposing modifying the Guardrails framework, and
>> therefore should not be affected by CEP-24. It is true that the
>> improvements provided by CEP-24 would benefit this Constraints framework,
>> but it is not dependent on them.
>>
>>
>> I hope I included all the points and addressed them on the CEP,
>> otherwise, please call it out and I’ll be more than happy to include it.
>>
>> Thanks everyone for all the inputs!
>> Bernardo
>>
>> On Jun 7, 2024, at 11:54 AM, Štefan Miklošovič <
>> stefan.mikloso...@gmail.com> wrote:
>>
>> How I see it is that in 5.1 there will be TCM for the very first time and
>> I do not think that config in TCM would make it into 5.1 based on what Sam
>> talks about (need for some stability etc), that makes total sense to me.
>> TCM is quite a big feature to deliver on its own and putting even way more
>> stuff into that might be detrimental to the quality if we rush it.
>>
>> Then sometimes after 5.1 we might take a serious look for config in TCM
>> itself.
>>
>> My plan, ideally, is to still ship CEP-24 without config in TCM, then
>> after 5.1 when config in TCM lands, CEP-24 might integrate with that on a
>> deeper level.
>>
>> If CEP-42 (this one) makes it into 5.1 as well, I think the similar case
>> might be done about that as well (integration with guardrails).
>>
>> On Fri, Jun 7, 2024 at 8:49 PM Sam Tunnicliffe <s...@beobal.com> wrote:
>>
>>> We've been working on a draft CEP for migrating config from yaml to
>>> cluster metadata but have been a bit short of time recently, I'll try to
>>> get something out for discussion as soon as possible.
>>> A little delay isn't such a bad thing IMO, as we're still ironing out
>>> the kinks in the TCM implementation itself. It'd be good to get a bit more
>>> road testing done with that before we start adding more to it, which I'm
>>> sure will start to ramp up once 5.0 is out.
>>>
>>> Thanks,
>>> Sam
>>>
>>> On 7 Jun 2024, at 19:19, Štefan Miklošovič <stefan.mikloso...@gmail.com>
>>> wrote:
>>>
>>> Yes, all configuration should be transactional (configuration which
>>> makes sense to require to be the same cluster-wide). Guardrails in TCM are
>>> just a subset of this problem. When I started to do CEP-24 I started with
>>> guardrails in TCM but then I realized it leads to more general "all config
>>> in TCM" and I found myself rabbit-hole-ing endlessly.
>>>
>>> BTW I do not think that once CEP-24 is in place without guardrails in
>>> TCM then implementing it would blow up things a lot. It is really just
>>> about a couple mutable virtual tables and a couple transformations for
>>> various guardrail types we have but I expect that its integration into more
>>> general config in TCM should be rather straightforward.
>>>
>>> Config in TCM definitely deserves its own CEP, it is too much to handle
>>> under CEP-24 and CEP-24 can go without it already. It just put a little bit
>>> more configuration acumen to nail it down correctly.
>>>
>>> Regards
>>>
>>> On Fri, Jun 7, 2024 at 8:12 PM Doug Rohrer <droh...@apple.com> wrote:
>>>
>>>> There’s a difference between the two though. Constraints are part of
>>>> the table schema, and (independent of the interaction with Guardrails),
>>>> have no dependency on yaml files being perfectly in sync across the
>>>> cluster. Therefore, the feature (Constraints) on its own doesn’t depend on
>>>> configuration files to be correct in its own right. The only place where
>>>> this isn’t true is it’s interaction with Guardrails, which happen to be
>>>> yaml-file based and cause issues.
>>>>
>>>> CEP-24’s password length requirements, however, is intended to be
>>>> implemented *by adding a new guardrail*, which is totally dependent on
>>>> YAML files today (and thus the concerns around a single misconfigured
>>>> server allowing someone to use an insecure password). If CEP-24 fixes
>>>> guardrails’ dependence on yaml files, it would *also* fix the
>>>> problematic interaction between guardrails and constraints.
>>>>
>>>> I agree that it would be incredibly valuable to find a solution to the
>>>> “yaml files need to be correct everywhere or something breaks” problem, and
>>>> I think CEP-24, being security-focused, is more likely to be problematic
>>>> without a solution to this issue. That said, I think Dinesh is right in
>>>> that, at the end of the day, CEP-24 could be implemented without fixing the
>>>> yaml config issue.
>>>>
>>>> I do wonder if the “Guardrails should be transactional” should really
>>>> be “configuration should be transactional”, or at least as much config as
>>>> possible should be, but that would blow up CEP-24 fairly dramatically
>>>> (maybe?). Maybe “cluster-wide configuration should be read from a
>>>> distributed source on startup/joining the cluster” or something would make
>>>> sense, so the yaml file works as the source of truth on startup, but as
>>>> soon as possible it’s read from a TCM-backed data source, and anything the
>>>> node can get from other nodes it would… but now I’m designing a different
>>>> CEP in a discuss thread, which is probably a bad idea...
>>>>
>>>> Regardless, I hope that I’m explaining why I see a difference between
>>>> constraints and guardrails, and why I think it makes sense that constraints
>>>> can move forward without a solution the misconfiguration problem where I
>>>> also think you were right in calling it out in CEP-24 (even if we
>>>> eventually move forward on CEP-24 without the solution in place).
>>>>
>>>> Doug
>>>>
>>>>
>>>>
>>>> On Jun 7, 2024, at 1:51 AM, Dinesh Joshi <djo...@apache.org> wrote:
>>>>
>>>> On Thu, Jun 6, 2024 at 1:03 PM Štefan Miklošovič <
>>>> stefan.mikloso...@gmail.com> wrote:
>>>>
>>>>> It is interesting to see this feedback. When I look at CEP-24 where I
>>>>> am obsessing about a user being able to misconfigure the password
>>>>> validation strength so if a user hits a "weak" node then she would be able
>>>>> to bypass it, and I see what is our approach here, then I am not sure what
>>>>> I was waiting so long for and I should probably be just more aggressive
>>>>> with the CEP and all the "caveats" could be just overlooked and deferred 
>>>>> to
>>>>> "sometimes later".
>>>>>
>>>>
>>>> Stefan, unfortunately I didn't participate in the CEP-24 DISCUSS
>>>> thread. Had I paid attention I would have suggested waiting on TCM doesn't
>>>> make the feature any different. The feature is less likely to be
>>>> misconfigured in a cluster. CEP-24 is valuable and password compliance with
>>>> policies is a super useful feature which IMO shouldn't have been held back
>>>> due to lack of TCM.
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: [DISCUSS] CEP-42: Constraints Framework

Reply via email to