Re: [DISCUSS] CEP-42: Constraints Framework

Štefan Miklošovič Thu, 06 Jun 2024 13:03:03 -0700

It is interesting to see this feedback. When I look at CEP-24 where I am
obsessing about a user being able to misconfigure the password validation
strength so if a user hits a "weak" node then she would be able to bypass
it, and I see what is our approach here, then I am not sure what I was
waiting so long for and I should probably be just more aggressive with the
CEP and all the "caveats" could be just overlooked and deferred to
"sometimes later".


On Thu, Jun 6, 2024 at 9:38 PM Doug Rohrer <droh...@apple.com> wrote:

> To me, the difference between system-level guardrails and table-level
> constraints is the difference between *operational* concerns (guardrails)
> and *business* concerns (table-level constraints). The two things are
> only related to one another because they both may limit the value of a
> field in some way, and there are some limited interactions between the two,
> but otherwise are essentially unrelated, and *both* have *independent*
>  value.
>
> I absolutely agree that trying to make *configuration* somehow
> transactional/cluster-wide vs. depending on operators to properly configure
> yaml files on each node is a useful feature. It is, however, a much broader
> conversation than just guardrails / constraints, and I don’t think the lack
> of a solution to the “operator misconfigured node X and it has different
> settings than node Y" in any way decreases the value of a table-level
> constraint system for enforcing use-case-specific constraints on the data
> in a table.
>
> The CEP does say:
>
> Interaction with Guardrails Framework
>
> As we mentioned in the motivation section, we currently have some
> guardrails for columns size in place which can be extended for other data
> types.
> Those guardrails will take preference over the defined constraints in the
> schema, and a SCHEMA ALTER adding constraints that break the limits defined
> by the guardrails framework will fail.
> If the guardrails themselves are modified, operator should get a warning
> mentioning that there are schemas with offending constraints.
>
>
> Other than throwing warnings in the logs, I’m not sure how exactly you’d
> warn the operator that there are schemas w/ offending constraints… but I
> suppose that would be enough. Given they are settable via JMX, I suppose
> any time you set one of them it’ll have to scan every constraint definition
> to make sure it doesn’t somehow violate the new guardrail value, which may
> require some additional interaction between the two systems, but again, it
> seems like this would be unrelated to *where* the configuration comes
> from, and we should be able to isolate it in the initial CEP-42
> implementation.
>
> In summary, I'm not seeing how the new constraint framework would require
> significant changes if the guardrails system, and configuration more
> generally, was rewritten to somehow provide a consistent view of the
> configuration across the cluster. In fact, the implementation of
> Guardrails, already has a “configuration provider” that, by default,
> happens to wrap the Yaml, but otherwise could pull configuration from other
> sources, so it’s already fairly insulated from configuration *storage*,
> which should make changing the underlying storage to something cluster-wide
> a fairly isolated change.
>
>
> Doug
>
> On Jun 6, 2024, at 2:12 PM, Štefan Miklošovič <stefan.mikloso...@gmail.com>
> wrote:
>
> OK so let's modify that example like this:
>
> T0 - a node is started with no guardrails set
> T1 - guardrail is set via JMX to not allow anything bigger than size of 10
> (whatever size means)
> T2 - a user creates a table with a constraint that anything bigger than
> size of 8 is forbidden
> T3 - a user inserts a mutation with size of 7
> T4 - node is restarted and guardrail in cassandra.yaml is set to forbid
> sizes bigger than 5
> T5 - mutation with size of 7 is replayed from FQL and it will fail to
> replay it because of "global guardrail" in yaml
>
> In general, the problem I see with this CEP is that I feel like we clearly
> see that it is a little bit hairy around the configuration and it _can_ be
> broken or misconfigured etc but the feedback I see is that "yeah but ... it
> is possible to break it already, so what?"
>
> I do not follow this logic. If we see that it "leaks", why is the leakage
> an excuse to put more features on top of that? Should not we fix the
> leakage in the first place? Why is that an excuse? I don't get that ... It
> is like "yeah it is broken so by putting more stuff on top of that it can't
> be worse".
>
> What if we focused our effort to make configuration transactional etc or
> at least tried to fix this problem so it does not happen? If we do not do
> that before we introduce this, then we will have more work to do once we go
> to address that but it might be probably too late because we will need to
> live with all our decisions made earlier, whatever ineffective they might
> be.
>
>
>
> On Thu, Jun 6, 2024 at 7:33 PM Yifan Cai <yc25c...@gmail.com> wrote:
>
>> Hi Stefan,
>>
>> Thanks for putting the FQL example! However, it seems to be incorrect.
>> FQL only records the _successful_ queries. The query at T4 fails, and it
>> will not be included in FQL log.
>> I do agree that changing guardrails on the fly can cause confusion when
>> FQL is enabled on the node. Operator should probably avoid doing so. But it
>> seems unrelated with contraints. Besides, there are value size guardrails,
>> i.e. columnValueSize and collectionSize, available in Cassandra already.
>>
>> On extensibility, I agree that the CEP should make it clear what
>> constraints are included and how they work. My understanding is that it
>> wants to have size check and value check, which are useful for most cases.
>>
>> - Yifan
>>
>> On Thu, Jun 6, 2024 at 9:25 AM Štefan Miklošovič <
>> stefan.mikloso...@gmail.com> wrote:
>>
>>> Another problem with this constraints feature is that if it does not
>>> solely rely on constraints in CQL, then it would be non-deterministic if we
>>> want to replay all mutations from a fql log.
>>>
>>> Let's take this into consideration (T = time)
>>>
>>> T0 - a node is started with no guardrails set
>>> T1 - guardrail is set via JMX to not allow anything bigger than size of
>>> 10 (whatever size means)
>>> T2 - a user creates a table with a constraint that anything bigger than
>>> size of 8 is forbidden
>>> T3 - a user inserts a mutation with size of 5
>>> T4 - a user modifies a table to set the constraint in such a way that
>>> anything bigger than size of 15 is forbidden - this will fail because we
>>> have a guardrail that anything bigger than 10 is forbidden from T1.
>>>
>>> Then we gather FQL log and restart the node, as guardrails do not
>>> survive restarts for now, when we replay, then T4 will be replayed too but
>>> it should not be.
>>>
>>> Is this correct?
>>>
>>> On Thu, Jun 6, 2024 at 9:49 AM Štefan Miklošovič <
>>> stefan.mikloso...@gmail.com> wrote:
>>>
>>>> I agree with Jon that a detailed description of all constraints to be
>>>> introduced is necessary. Only to say that it will be extensible so we can
>>>> add other constraints later is not enough. What other constraints?
>>>>
>>>> On Thu, Jun 6, 2024 at 6:24 AM Jon Haddad <j...@jonhaddad.com> wrote:
>>>>
>>>>> I think there's some promising ideas here, but the CEP needs to be
>>>>> developed a bit more.
>>>>>
>>>>> > Another types of constraints and functions can be added in the
>>>>> future to provide even more flexibility, but are out of the scope of this
>>>>> CEP.
>>>>>
>>>>> > For the third point, I didn’t want to be prescriptive on what those
>>>>> validations should be, but the fact that the proposal is extensible to
>>>>> those potential use cases is something concrete that, in my opinion, comes
>>>>> as a benefit of the actual proposal. I’d be happy to develop a bit more 
>>>>> the
>>>>> main example used of sizeOf if it helps alleviate your concerns on this
>>>>> point.
>>>>>
>>>>> I disagree, quite strongly, with this.  While I appreciate
>>>>> extensibility, I think having a variety of actual constraints that ship
>>>>> with the feature means it needs to be built to satisfy real world use
>>>>> cases.  Without going through this process, it feels a bit too much like
>>>>> triggers, UDAs and UDFs  - incomplete, and too much left to the end user.
>>>>>
>>>>> To me, punting on thinking through constraints kicks the most
>>>>> important can down the road.
>>>>>
>>>>> Jon
>>>>>
>>>>>
>>>>> On Tue, Jun 4, 2024 at 5:37 PM Bernardo Botella <
>>>>> conta...@bernardobotella.com> wrote:
>>>>>
>>>>>> In the CEP document there is another example (altho not explicetly
>>>>>> mentioned) adding a constraint to the max value of an int ->
>>>>>> `number_of_items int CONSTRAINT number_of_items < 1000`
>>>>>>
>>>>>> This basic example can also be used to expand on how to extend this
>>>>>> functionality with these two initial constraints (size and value), by
>>>>>> composing them to create new data types with proper validation.
>>>>>>
>>>>>> For example, this could create an ipv4 with built in validation:
>>>>>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>>>>>   ip_adress inet,
>>>>>>   subnet_mask int,
>>>>>>   CONSTRAINT subnet_mask > 0,
>>>>>>   CONSTRAINT subnet_mask < 32
>>>>>> )
>>>>>>
>>>>>> Or a color type:
>>>>>> CREATE TYPE keyspace.color (
>>>>>>   r int,
>>>>>>   g int,
>>>>>>   b int,
>>>>>>   CONSTRAINT r >= 0,
>>>>>>   CONSTRAINT r < 255,
>>>>>>   CONSTRAINT g >= 0,
>>>>>>   CONSTRAINT g < 255,
>>>>>>   CONSTRAINT b >= 0,
>>>>>>   CONSTRAINT b < 255,
>>>>>> )
>>>>>>
>>>>>>
>>>>>> Another types of constraints and functions can be added in the future
>>>>>> to provide even more flexibility, but are out of the scope of this CEP.
>>>>>>
>>>>>> Bernardo
>>>>>>
>>>>>> On Jun 4, 2024, at 1:01 PM, Jon Haddad <j...@jonhaddad.com> wrote:
>>>>>>
>>>>>> The idea is interesting.  I think it would help to have more concrete
>>>>>> examples.  It's a bit sparse at the moment, and I have a hard time 
>>>>>> getting
>>>>>> on board with new features where the main selling point is Extensibility
>>>>>> over the value they provide on their own.
>>>>>>
>>>>>> I think it would help a lot if we knew what types of constraints,
>>>>>> besides the size check, you were thinking of adding.
>>>>>>
>>>>>> Jon
>>>>>>
>>>>>> On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella <
>>>>>> conta...@bernardobotella.com> wrote:
>>>>>>
>>>>>>> Yes, that is correct. This particular behavior will need CEP-24 in
>>>>>>> order to work reliably. But, if my understanding is correct, that 
>>>>>>> statement
>>>>>>> holds true for the entirety of Guardrails, and not only for this 
>>>>>>> particular
>>>>>>> feature.
>>>>>>>
>>>>>>> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan <
>>>>>>> stefan.mikloso...@netapp.com> wrote:
>>>>>>>
>>>>>>> That would work reliably in case there is no way how to misconfigure
>>>>>>> guardrails in the cluster. What if you set a guardrail on one node but 
>>>>>>> you
>>>>>>> don’t set it (or set it differently) on the other? If it is configured
>>>>>>> differently and you want to check the guardrails if constraints do not
>>>>>>> violate them, then your query might fail or not based on what node is 
>>>>>>> hit.
>>>>>>>
>>>>>>> I guess that guardrails would need to start to be transactional to
>>>>>>> be sure this is avoided and guardrails are indeed same everywhere 
>>>>>>> (CEP-24
>>>>>>> thread sent recently here in ML).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From: *Bernardo Botella <conta...@bernardobotella.com>
>>>>>>> *Date: *Tuesday, 4 June 2024 at 00:31
>>>>>>> *To: *dev@cassandra.apache.org <dev@cassandra.apache.org>
>>>>>>> *Cc: *Miklosovic, Stefan <stefan.mikloso...@netapp.com>
>>>>>>> *Subject: *Re: [DISCUSS] CEP-42: Constraints Framework
>>>>>>> You don't often get email from conta...@bernardobotella.com. Learn
>>>>>>> why this is important
>>>>>>> <https://aka.ms/LearnAboutSenderIdentification>
>>>>>>>
>>>>>>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Basically, I am trying to protect the limits set by the operator
>>>>>>> against misconfigured schemas from the customers.
>>>>>>>
>>>>>>> I see the guardrails as a safety limit added by the operator,
>>>>>>> setting the limits within the customers owning the actual schema (and 
>>>>>>> their
>>>>>>> constraints) can operate. With that vision, if a customer tries to 
>>>>>>> “ignore”
>>>>>>> the actual limits set by the operator by adding more relaxed 
>>>>>>> constraints,
>>>>>>> it gets a nice message saying that “that is not allowed for the cluster,
>>>>>>> please contact your admin".
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Jun 3, 2024, at 2:51 PM, Miklosovic, Stefan via dev <
>>>>>>> dev@cassandra.apache.org> wrote:
>>>>>>>
>>>>>>> You wrote in the CEP:
>>>>>>>
>>>>>>> As we mentioned in the motivation section, we currently have some
>>>>>>> guardrails for columns size in place which can be extended for other 
>>>>>>> data
>>>>>>> types.
>>>>>>> Those guardrails will take preference over the defined constraints
>>>>>>> in the schema, and a SCHEMA ALTER adding constraints that break the 
>>>>>>> limits
>>>>>>> defined by the guardrails framework will fail.
>>>>>>> If the guardrails themselves are modified, operator should get a
>>>>>>> warning mentioning that there are schemas with offending constraints.
>>>>>>>
>>>>>>> I think that this should be other way around. Guardrails should kick
>>>>>>> in when there are no constraints and they would be overridden by table
>>>>>>> schema. That way, there is always a “default” in terms of guardrails 
>>>>>>> (which
>>>>>>> one can turn off on demand / change) but you can override it by table
>>>>>>> alternation.
>>>>>>>
>>>>>>> Basically, what is in schema should win regardless of how guardrails
>>>>>>> are configured. They don’t matter when a constraint is explicitly 
>>>>>>> specified
>>>>>>> in a schema. It should take the defaults in guardrails if there are any 
>>>>>>> and
>>>>>>> no constraint is specified on schema level.
>>>>>>>
>>>>>>> What is your motivation to do it like you suggested?
>>>>>>>
>>>>>>>
>>>>>>> *From: *Bernardo Botella <conta...@bernardobotella.com>
>>>>>>> *Date: *Friday, 31 May 2024 at 23:24
>>>>>>> *To: *dev@cassandra.apache.org <dev@cassandra.apache.org>
>>>>>>> *Subject: *[DISCUSS] CEP-42: Constraints Framework
>>>>>>> You don't often get email from conta...@bernardobotella.com. Learn
>>>>>>> why this is important
>>>>>>> <https://aka.ms/LearnAboutSenderIdentification>
>>>>>>>
>>>>>>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>>>>>>
>>>>>>>
>>>>>>> Hello everyone,
>>>>>>>
>>>>>>> I am proposing this CEP:
>>>>>>> CEP-42: Constraints Framework - CASSANDRA - Apache Software
>>>>>>> Foundation
>>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>>>>>>> cwiki.apache.org
>>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>>>>>>> <favicon.ico>
>>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>>>>>>>
>>>>>>>
>>>>>>> And I’m looking for feedback from the community.
>>>>>>>
>>>>>>> Thanks a lot!
>>>>>>> Bernardo
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>

Re: [DISCUSS] CEP-42: Constraints Framework

Reply via email to