Re: [DISCUSS] CEP-42: Constraints Framework

Štefan Miklošovič Thu, 06 Jun 2024 11:13:24 -0700

OK so let's modify that example like this:

T0 - a node is started with no guardrails set
T1 - guardrail is set via JMX to not allow anything bigger than size of 10
(whatever size means)
T2 - a user creates a table with a constraint that anything bigger than
size of 8 is forbidden
T3 - a user inserts a mutation with size of 7
T4 - node is restarted and guardrail in cassandra.yaml is set to forbid
sizes bigger than 5
T5 - mutation with size of 7 is replayed from FQL and it will fail to
replay it because of "global guardrail" in yaml


In general, the problem I see with this CEP is that I feel like we clearly
see that it is a little bit hairy around the configuration and it _can_ be
broken or misconfigured etc but the feedback I see is that "yeah but ... it
is possible to break it already, so what?"

I do not follow this logic. If we see that it "leaks", why is the leakage
an excuse to put more features on top of that? Should not we fix the
leakage in the first place? Why is that an excuse? I don't get that ... It
is like "yeah it is broken so by putting more stuff on top of that it can't
be worse".

What if we focused our effort to make configuration transactional etc or at
least tried to fix this problem so it does not happen? If we do not do that
before we introduce this, then we will have more work to do once we go to
address that but it might be probably too late because we will need to live
with all our decisions made earlier, whatever ineffective they might be.



On Thu, Jun 6, 2024 at 7:33 PM Yifan Cai <[email protected]> wrote:

> Hi Stefan,
>
> Thanks for putting the FQL example! However, it seems to be incorrect. FQL
> only records the _successful_ queries. The query at T4 fails, and it will
> not be included in FQL log.
> I do agree that changing guardrails on the fly can cause confusion when
> FQL is enabled on the node. Operator should probably avoid doing so. But it
> seems unrelated with contraints. Besides, there are value size guardrails,
> i.e. columnValueSize and collectionSize, available in Cassandra already.
>
> On extensibility, I agree that the CEP should make it clear what
> constraints are included and how they work. My understanding is that it
> wants to have size check and value check, which are useful for most cases.
>
> - Yifan
>
> On Thu, Jun 6, 2024 at 9:25 AM Štefan Miklošovič <
> [email protected]> wrote:
>
>> Another problem with this constraints feature is that if it does not
>> solely rely on constraints in CQL, then it would be non-deterministic if we
>> want to replay all mutations from a fql log.
>>
>> Let's take this into consideration (T = time)
>>
>> T0 - a node is started with no guardrails set
>> T1 - guardrail is set via JMX to not allow anything bigger than size of
>> 10 (whatever size means)
>> T2 - a user creates a table with a constraint that anything bigger than
>> size of 8 is forbidden
>> T3 - a user inserts a mutation with size of 5
>> T4 - a user modifies a table to set the constraint in such a way that
>> anything bigger than size of 15 is forbidden - this will fail because we
>> have a guardrail that anything bigger than 10 is forbidden from T1.
>>
>> Then we gather FQL log and restart the node, as guardrails do not survive
>> restarts for now, when we replay, then T4 will be replayed too but it
>> should not be.
>>
>> Is this correct?
>>
>> On Thu, Jun 6, 2024 at 9:49 AM Štefan Miklošovič <
>> [email protected]> wrote:
>>
>>> I agree with Jon that a detailed description of all constraints to be
>>> introduced is necessary. Only to say that it will be extensible so we can
>>> add other constraints later is not enough. What other constraints?
>>>
>>> On Thu, Jun 6, 2024 at 6:24 AM Jon Haddad <[email protected]> wrote:
>>>
>>>> I think there's some promising ideas here, but the CEP needs to be
>>>> developed a bit more.
>>>>
>>>> > Another types of constraints and functions can be added in the future
>>>> to provide even more flexibility, but are out of the scope of this CEP.
>>>>
>>>> > For the third point, I didn’t want to be prescriptive on what those
>>>> validations should be, but the fact that the proposal is extensible to
>>>> those potential use cases is something concrete that, in my opinion, comes
>>>> as a benefit of the actual proposal. I’d be happy to develop a bit more the
>>>> main example used of sizeOf if it helps alleviate your concerns on this
>>>> point.
>>>>
>>>> I disagree, quite strongly, with this.  While I appreciate
>>>> extensibility, I think having a variety of actual constraints that ship
>>>> with the feature means it needs to be built to satisfy real world use
>>>> cases.  Without going through this process, it feels a bit too much like
>>>> triggers, UDAs and UDFs  - incomplete, and too much left to the end user.
>>>>
>>>> To me, punting on thinking through constraints kicks the most important
>>>> can down the road.
>>>>
>>>> Jon
>>>>
>>>>
>>>> On Tue, Jun 4, 2024 at 5:37 PM Bernardo Botella <
>>>> [email protected]> wrote:
>>>>
>>>>> In the CEP document there is another example (altho not explicetly
>>>>> mentioned) adding a constraint to the max value of an int ->
>>>>> `number_of_items int CONSTRAINT number_of_items < 1000`
>>>>>
>>>>> This basic example can also be used to expand on how to extend this
>>>>> functionality with these two initial constraints (size and value), by
>>>>> composing them to create new data types with proper validation.
>>>>>
>>>>> For example, this could create an ipv4 with built in validation:
>>>>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>>>>   ip_adress inet,
>>>>>   subnet_mask int,
>>>>>   CONSTRAINT subnet_mask > 0,
>>>>>   CONSTRAINT subnet_mask < 32
>>>>> )
>>>>>
>>>>> Or a color type:
>>>>> CREATE TYPE keyspace.color (
>>>>>   r int,
>>>>>   g int,
>>>>>   b int,
>>>>>   CONSTRAINT r >= 0,
>>>>>   CONSTRAINT r < 255,
>>>>>   CONSTRAINT g >= 0,
>>>>>   CONSTRAINT g < 255,
>>>>>   CONSTRAINT b >= 0,
>>>>>   CONSTRAINT b < 255,
>>>>> )
>>>>>
>>>>>
>>>>> Another types of constraints and functions can be added in the future
>>>>> to provide even more flexibility, but are out of the scope of this CEP.
>>>>>
>>>>> Bernardo
>>>>>
>>>>> On Jun 4, 2024, at 1:01 PM, Jon Haddad <[email protected]> wrote:
>>>>>
>>>>> The idea is interesting.  I think it would help to have more concrete
>>>>> examples.  It's a bit sparse at the moment, and I have a hard time getting
>>>>> on board with new features where the main selling point is Extensibility
>>>>> over the value they provide on their own.
>>>>>
>>>>> I think it would help a lot if we knew what types of constraints,
>>>>> besides the size check, you were thinking of adding.
>>>>>
>>>>> Jon
>>>>>
>>>>> On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Yes, that is correct. This particular behavior will need CEP-24 in
>>>>>> order to work reliably. But, if my understanding is correct, that 
>>>>>> statement
>>>>>> holds true for the entirety of Guardrails, and not only for this 
>>>>>> particular
>>>>>> feature.
>>>>>>
>>>>>> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>> That would work reliably in case there is no way how to misconfigure
>>>>>> guardrails in the cluster. What if you set a guardrail on one node but 
>>>>>> you
>>>>>> don’t set it (or set it differently) on the other? If it is configured
>>>>>> differently and you want to check the guardrails if constraints do not
>>>>>> violate them, then your query might fail or not based on what node is 
>>>>>> hit.
>>>>>>
>>>>>> I guess that guardrails would need to start to be transactional to be
>>>>>> sure this is avoided and guardrails are indeed same everywhere (CEP-24
>>>>>> thread sent recently here in ML).
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From: *Bernardo Botella <[email protected]>
>>>>>> *Date: *Tuesday, 4 June 2024 at 00:31
>>>>>> *To: *[email protected] <[email protected]>
>>>>>> *Cc: *Miklosovic, Stefan <[email protected]>
>>>>>> *Subject: *Re: [DISCUSS] CEP-42: Constraints Framework
>>>>>> You don't often get email from [email protected]. Learn
>>>>>> why this is important <https://aka.ms/LearnAboutSenderIdentification>
>>>>>>
>>>>>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>>>>>
>>>>>>
>>>>>>
>>>>>> Basically, I am trying to protect the limits set by the operator
>>>>>> against misconfigured schemas from the customers.
>>>>>>
>>>>>> I see the guardrails as a safety limit added by the operator, setting
>>>>>> the limits within the customers owning the actual schema (and their
>>>>>> constraints) can operate. With that vision, if a customer tries to 
>>>>>> “ignore”
>>>>>> the actual limits set by the operator by adding more relaxed constraints,
>>>>>> it gets a nice message saying that “that is not allowed for the cluster,
>>>>>> please contact your admin".
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Jun 3, 2024, at 2:51 PM, Miklosovic, Stefan via dev <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>> You wrote in the CEP:
>>>>>>
>>>>>> As we mentioned in the motivation section, we currently have some
>>>>>> guardrails for columns size in place which can be extended for other data
>>>>>> types.
>>>>>> Those guardrails will take preference over the defined constraints in
>>>>>> the schema, and a SCHEMA ALTER adding constraints that break the limits
>>>>>> defined by the guardrails framework will fail.
>>>>>> If the guardrails themselves are modified, operator should get a
>>>>>> warning mentioning that there are schemas with offending constraints.
>>>>>>
>>>>>> I think that this should be other way around. Guardrails should kick
>>>>>> in when there are no constraints and they would be overridden by table
>>>>>> schema. That way, there is always a “default” in terms of guardrails 
>>>>>> (which
>>>>>> one can turn off on demand / change) but you can override it by table
>>>>>> alternation.
>>>>>>
>>>>>> Basically, what is in schema should win regardless of how guardrails
>>>>>> are configured. They don’t matter when a constraint is explicitly 
>>>>>> specified
>>>>>> in a schema. It should take the defaults in guardrails if there are any 
>>>>>> and
>>>>>> no constraint is specified on schema level.
>>>>>>
>>>>>> What is your motivation to do it like you suggested?
>>>>>>
>>>>>>
>>>>>> *From: *Bernardo Botella <[email protected]>
>>>>>> *Date: *Friday, 31 May 2024 at 23:24
>>>>>> *To: *[email protected] <[email protected]>
>>>>>> *Subject: *[DISCUSS] CEP-42: Constraints Framework
>>>>>> You don't often get email from [email protected]. Learn
>>>>>> why this is important <https://aka.ms/LearnAboutSenderIdentification>
>>>>>>
>>>>>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>>>>>
>>>>>>
>>>>>> Hello everyone,
>>>>>>
>>>>>> I am proposing this CEP:
>>>>>> CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundation
>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>>>>>> cwiki.apache.org
>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>>>>>> <favicon.ico>
>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>>>>>>
>>>>>>
>>>>>> And I’m looking for feedback from the community.
>>>>>>
>>>>>> Thanks a lot!
>>>>>> Bernardo
>>>>>>
>>>>>>
>>>>>>
>>>>>

Re: [DISCUSS] CEP-42: Constraints Framework

Reply via email to