Re: [DISCUSS] CEP-42: Constraints Framework

Jon Haddad Thu, 06 Jun 2024 13:23:16 -0700

I think figuring out a roadmap and strategy ahead of time is of high value,
but we don't necessarily have to implement everything all at once.  As long
as we identify the problematic cases, and provide guidance (for example:
disable guardrails or constraints when replaying traffic with FQL), then I
think that's OK.


All I'm after here is to ensure the feature is well thought out so it
doesn't turn into yet another abandoned idea that users trip over that we
can never remove, like materialized views.  Stefan's made a great point
here and I think we should continue poking holes in, and evolving, the
CEP.



On Thu, Jun 6, 2024 at 1:03 PM Štefan Miklošovič <
stefan.mikloso...@gmail.com> wrote:

> It is interesting to see this feedback. When I look at CEP-24 where I am
> obsessing about a user being able to misconfigure the password validation
> strength so if a user hits a "weak" node then she would be able to bypass
> it, and I see what is our approach here, then I am not sure what I was
> waiting so long for and I should probably be just more aggressive with the
> CEP and all the "caveats" could be just overlooked and deferred to
> "sometimes later".
>
> On Thu, Jun 6, 2024 at 9:38 PM Doug Rohrer <droh...@apple.com> wrote:
>
>> To me, the difference between system-level guardrails and table-level
>> constraints is the difference between *operational* concerns
>> (guardrails) and *business* concerns (table-level constraints). The two
>> things are only related to one another because they both may limit the
>> value of a field in some way, and there are some limited interactions
>> between the two, but otherwise are essentially unrelated, and *both* have
>> *independent* value.
>>
>> I absolutely agree that trying to make *configuration* somehow
>> transactional/cluster-wide vs. depending on operators to properly configure
>> yaml files on each node is a useful feature. It is, however, a much broader
>> conversation than just guardrails / constraints, and I don’t think the lack
>> of a solution to the “operator misconfigured node X and it has different
>> settings than node Y" in any way decreases the value of a table-level
>> constraint system for enforcing use-case-specific constraints on the data
>> in a table.
>>
>> The CEP does say:
>>
>> Interaction with Guardrails Framework
>>
>> As we mentioned in the motivation section, we currently have some
>> guardrails for columns size in place which can be extended for other data
>> types.
>> Those guardrails will take preference over the defined constraints in the
>> schema, and a SCHEMA ALTER adding constraints that break the limits defined
>> by the guardrails framework will fail.
>> If the guardrails themselves are modified, operator should get a warning
>> mentioning that there are schemas with offending constraints.
>>
>>
>> Other than throwing warnings in the logs, I’m not sure how exactly you’d
>> warn the operator that there are schemas w/ offending constraints… but I
>> suppose that would be enough. Given they are settable via JMX, I suppose
>> any time you set one of them it’ll have to scan every constraint definition
>> to make sure it doesn’t somehow violate the new guardrail value, which may
>> require some additional interaction between the two systems, but again, it
>> seems like this would be unrelated to *where* the configuration comes
>> from, and we should be able to isolate it in the initial CEP-42
>> implementation.
>>
>> In summary, I'm not seeing how the new constraint framework would require
>> significant changes if the guardrails system, and configuration more
>> generally, was rewritten to somehow provide a consistent view of the
>> configuration across the cluster. In fact, the implementation of
>> Guardrails, already has a “configuration provider” that, by default,
>> happens to wrap the Yaml, but otherwise could pull configuration from other
>> sources, so it’s already fairly insulated from configuration *storage*,
>> which should make changing the underlying storage to something cluster-wide
>> a fairly isolated change.
>>
>>
>> Doug
>>
>> On Jun 6, 2024, at 2:12 PM, Štefan Miklošovič <
>> stefan.mikloso...@gmail.com> wrote:
>>
>> OK so let's modify that example like this:
>>
>> T0 - a node is started with no guardrails set
>> T1 - guardrail is set via JMX to not allow anything bigger than size of
>> 10 (whatever size means)
>> T2 - a user creates a table with a constraint that anything bigger than
>> size of 8 is forbidden
>> T3 - a user inserts a mutation with size of 7
>> T4 - node is restarted and guardrail in cassandra.yaml is set to forbid
>> sizes bigger than 5
>> T5 - mutation with size of 7 is replayed from FQL and it will fail to
>> replay it because of "global guardrail" in yaml
>>
>> In general, the problem I see with this CEP is that I feel like we
>> clearly see that it is a little bit hairy around the configuration and it
>> _can_ be broken or misconfigured etc but the feedback I see is that "yeah
>> but ... it is possible to break it already, so what?"
>>
>> I do not follow this logic. If we see that it "leaks", why is the leakage
>> an excuse to put more features on top of that? Should not we fix the
>> leakage in the first place? Why is that an excuse? I don't get that ... It
>> is like "yeah it is broken so by putting more stuff on top of that it can't
>> be worse".
>>
>> What if we focused our effort to make configuration transactional etc or
>> at least tried to fix this problem so it does not happen? If we do not do
>> that before we introduce this, then we will have more work to do once we go
>> to address that but it might be probably too late because we will need to
>> live with all our decisions made earlier, whatever ineffective they might
>> be.
>>
>>
>>
>> On Thu, Jun 6, 2024 at 7:33 PM Yifan Cai <yc25c...@gmail.com> wrote:
>>
>>> Hi Stefan,
>>>
>>> Thanks for putting the FQL example! However, it seems to be incorrect.
>>> FQL only records the _successful_ queries. The query at T4 fails, and it
>>> will not be included in FQL log.
>>> I do agree that changing guardrails on the fly can cause confusion when
>>> FQL is enabled on the node. Operator should probably avoid doing so. But it
>>> seems unrelated with contraints. Besides, there are value size guardrails,
>>> i.e. columnValueSize and collectionSize, available in Cassandra already.
>>>
>>> On extensibility, I agree that the CEP should make it clear what
>>> constraints are included and how they work. My understanding is that it
>>> wants to have size check and value check, which are useful for most cases.
>>>
>>> - Yifan
>>>
>>> On Thu, Jun 6, 2024 at 9:25 AM Štefan Miklošovič <
>>> stefan.mikloso...@gmail.com> wrote:
>>>
>>>> Another problem with this constraints feature is that if it does not
>>>> solely rely on constraints in CQL, then it would be non-deterministic if we
>>>> want to replay all mutations from a fql log.
>>>>
>>>> Let's take this into consideration (T = time)
>>>>
>>>> T0 - a node is started with no guardrails set
>>>> T1 - guardrail is set via JMX to not allow anything bigger than size of
>>>> 10 (whatever size means)
>>>> T2 - a user creates a table with a constraint that anything bigger than
>>>> size of 8 is forbidden
>>>> T3 - a user inserts a mutation with size of 5
>>>> T4 - a user modifies a table to set the constraint in such a way that
>>>> anything bigger than size of 15 is forbidden - this will fail because we
>>>> have a guardrail that anything bigger than 10 is forbidden from T1.
>>>>
>>>> Then we gather FQL log and restart the node, as guardrails do not
>>>> survive restarts for now, when we replay, then T4 will be replayed too but
>>>> it should not be.
>>>>
>>>> Is this correct?
>>>>
>>>> On Thu, Jun 6, 2024 at 9:49 AM Štefan Miklošovič <
>>>> stefan.mikloso...@gmail.com> wrote:
>>>>
>>>>> I agree with Jon that a detailed description of all constraints to be
>>>>> introduced is necessary. Only to say that it will be extensible so we can
>>>>> add other constraints later is not enough. What other constraints?
>>>>>
>>>>> On Thu, Jun 6, 2024 at 6:24 AM Jon Haddad <j...@jonhaddad.com> wrote:
>>>>>
>>>>>> I think there's some promising ideas here, but the CEP needs to be
>>>>>> developed a bit more.
>>>>>>
>>>>>> > Another types of constraints and functions can be added in the
>>>>>> future to provide even more flexibility, but are out of the scope of this
>>>>>> CEP.
>>>>>>
>>>>>> > For the third point, I didn’t want to be prescriptive on what those
>>>>>> validations should be, but the fact that the proposal is extensible to
>>>>>> those potential use cases is something concrete that, in my opinion, 
>>>>>> comes
>>>>>> as a benefit of the actual proposal. I’d be happy to develop a bit more 
>>>>>> the
>>>>>> main example used of sizeOf if it helps alleviate your concerns on this
>>>>>> point.
>>>>>>
>>>>>> I disagree, quite strongly, with this.  While I appreciate
>>>>>> extensibility, I think having a variety of actual constraints that ship
>>>>>> with the feature means it needs to be built to satisfy real world use
>>>>>> cases.  Without going through this process, it feels a bit too much like
>>>>>> triggers, UDAs and UDFs  - incomplete, and too much left to the end user.
>>>>>>
>>>>>> To me, punting on thinking through constraints kicks the most
>>>>>> important can down the road.
>>>>>>
>>>>>> Jon
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 4, 2024 at 5:37 PM Bernardo Botella <
>>>>>> conta...@bernardobotella.com> wrote:
>>>>>>
>>>>>>> In the CEP document there is another example (altho not explicetly
>>>>>>> mentioned) adding a constraint to the max value of an int ->
>>>>>>> `number_of_items int CONSTRAINT number_of_items < 1000`
>>>>>>>
>>>>>>> This basic example can also be used to expand on how to extend this
>>>>>>> functionality with these two initial constraints (size and value), by
>>>>>>> composing them to create new data types with proper validation.
>>>>>>>
>>>>>>> For example, this could create an ipv4 with built in validation:
>>>>>>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>>>>>>   ip_adress inet,
>>>>>>>   subnet_mask int,
>>>>>>>   CONSTRAINT subnet_mask > 0,
>>>>>>>   CONSTRAINT subnet_mask < 32
>>>>>>> )
>>>>>>>
>>>>>>> Or a color type:
>>>>>>> CREATE TYPE keyspace.color (
>>>>>>>   r int,
>>>>>>>   g int,
>>>>>>>   b int,
>>>>>>>   CONSTRAINT r >= 0,
>>>>>>>   CONSTRAINT r < 255,
>>>>>>>   CONSTRAINT g >= 0,
>>>>>>>   CONSTRAINT g < 255,
>>>>>>>   CONSTRAINT b >= 0,
>>>>>>>   CONSTRAINT b < 255,
>>>>>>> )
>>>>>>>
>>>>>>>
>>>>>>> Another types of constraints and functions can be added in the
>>>>>>> future to provide even more flexibility, but are out of the scope of 
>>>>>>> this
>>>>>>> CEP.
>>>>>>>
>>>>>>> Bernardo
>>>>>>>
>>>>>>> On Jun 4, 2024, at 1:01 PM, Jon Haddad <j...@jonhaddad.com> wrote:
>>>>>>>
>>>>>>> The idea is interesting.  I think it would help to have more
>>>>>>> concrete examples.  It's a bit sparse at the moment, and I have a hard 
>>>>>>> time
>>>>>>> getting on board with new features where the main selling point
>>>>>>> is Extensibility over the value they provide on their own.
>>>>>>>
>>>>>>> I think it would help a lot if we knew what types of constraints,
>>>>>>> besides the size check, you were thinking of adding.
>>>>>>>
>>>>>>> Jon
>>>>>>>
>>>>>>> On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella <
>>>>>>> conta...@bernardobotella.com> wrote:
>>>>>>>
>>>>>>>> Yes, that is correct. This particular behavior will need CEP-24 in
>>>>>>>> order to work reliably. But, if my understanding is correct, that 
>>>>>>>> statement
>>>>>>>> holds true for the entirety of Guardrails, and not only for this 
>>>>>>>> particular
>>>>>>>> feature.
>>>>>>>>
>>>>>>>> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan <
>>>>>>>> stefan.mikloso...@netapp.com> wrote:
>>>>>>>>
>>>>>>>> That would work reliably in case there is no way how to
>>>>>>>> misconfigure guardrails in the cluster. What if you set a guardrail on 
>>>>>>>> one
>>>>>>>> node but you don’t set it (or set it differently) on the other? If it 
>>>>>>>> is
>>>>>>>> configured differently and you want to check the guardrails if 
>>>>>>>> constraints
>>>>>>>> do not violate them, then your query might fail or not based on what 
>>>>>>>> node
>>>>>>>> is hit.
>>>>>>>>
>>>>>>>> I guess that guardrails would need to start to be transactional to
>>>>>>>> be sure this is avoided and guardrails are indeed same everywhere 
>>>>>>>> (CEP-24
>>>>>>>> thread sent recently here in ML).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From: *Bernardo Botella <conta...@bernardobotella.com>
>>>>>>>> *Date: *Tuesday, 4 June 2024 at 00:31
>>>>>>>> *To: *dev@cassandra.apache.org <dev@cassandra.apache.org>
>>>>>>>> *Cc: *Miklosovic, Stefan <stefan.mikloso...@netapp.com>
>>>>>>>> *Subject: *Re: [DISCUSS] CEP-42: Constraints Framework
>>>>>>>> You don't often get email from conta...@bernardobotella.com. Learn
>>>>>>>> why this is important
>>>>>>>> <https://aka.ms/LearnAboutSenderIdentification>
>>>>>>>>
>>>>>>>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Basically, I am trying to protect the limits set by the operator
>>>>>>>> against misconfigured schemas from the customers.
>>>>>>>>
>>>>>>>> I see the guardrails as a safety limit added by the operator,
>>>>>>>> setting the limits within the customers owning the actual schema (and 
>>>>>>>> their
>>>>>>>> constraints) can operate. With that vision, if a customer tries to 
>>>>>>>> “ignore”
>>>>>>>> the actual limits set by the operator by adding more relaxed 
>>>>>>>> constraints,
>>>>>>>> it gets a nice message saying that “that is not allowed for the 
>>>>>>>> cluster,
>>>>>>>> please contact your admin".
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jun 3, 2024, at 2:51 PM, Miklosovic, Stefan via dev <
>>>>>>>> dev@cassandra.apache.org> wrote:
>>>>>>>>
>>>>>>>> You wrote in the CEP:
>>>>>>>>
>>>>>>>> As we mentioned in the motivation section, we currently have some
>>>>>>>> guardrails for columns size in place which can be extended for other 
>>>>>>>> data
>>>>>>>> types.
>>>>>>>> Those guardrails will take preference over the defined constraints
>>>>>>>> in the schema, and a SCHEMA ALTER adding constraints that break the 
>>>>>>>> limits
>>>>>>>> defined by the guardrails framework will fail.
>>>>>>>> If the guardrails themselves are modified, operator should get a
>>>>>>>> warning mentioning that there are schemas with offending constraints.
>>>>>>>>
>>>>>>>> I think that this should be other way around. Guardrails should
>>>>>>>> kick in when there are no constraints and they would be overridden by 
>>>>>>>> table
>>>>>>>> schema. That way, there is always a “default” in terms of guardrails 
>>>>>>>> (which
>>>>>>>> one can turn off on demand / change) but you can override it by table
>>>>>>>> alternation.
>>>>>>>>
>>>>>>>> Basically, what is in schema should win regardless of how
>>>>>>>> guardrails are configured. They don’t matter when a constraint is
>>>>>>>> explicitly specified in a schema. It should take the defaults in 
>>>>>>>> guardrails
>>>>>>>> if there are any and no constraint is specified on schema level.
>>>>>>>>
>>>>>>>> What is your motivation to do it like you suggested?
>>>>>>>>
>>>>>>>>
>>>>>>>> *From: *Bernardo Botella <conta...@bernardobotella.com>
>>>>>>>> *Date: *Friday, 31 May 2024 at 23:24
>>>>>>>> *To: *dev@cassandra.apache.org <dev@cassandra.apache.org>
>>>>>>>> *Subject: *[DISCUSS] CEP-42: Constraints Framework
>>>>>>>> You don't often get email from conta...@bernardobotella.com. Learn
>>>>>>>> why this is important
>>>>>>>> <https://aka.ms/LearnAboutSenderIdentification>
>>>>>>>>
>>>>>>>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>>>>>>>
>>>>>>>>
>>>>>>>> Hello everyone,
>>>>>>>>
>>>>>>>> I am proposing this CEP:
>>>>>>>> CEP-42: Constraints Framework - CASSANDRA - Apache Software
>>>>>>>> Foundation
>>>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>>>>>>>> cwiki.apache.org
>>>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>>>>>>>> <favicon.ico>
>>>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>>>>>>>>
>>>>>>>>
>>>>>>>> And I’m looking for feedback from the community.
>>>>>>>>
>>>>>>>> Thanks a lot!
>>>>>>>> Bernardo
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>

Re: [DISCUSS] CEP-42: Constraints Framework

Reply via email to