It is interesting to see this feedback. When I look at CEP-24 where I am obsessing about a user being able to misconfigure the password validation strength so if a user hits a "weak" node then she would be able to bypass it, and I see what is our approach here, then I am not sure what I was waiting so long for and I should probably be just more aggressive with the CEP and all the "caveats" could be just overlooked and deferred to "sometimes later".
On Thu, Jun 6, 2024 at 9:38 PM Doug Rohrer <droh...@apple.com> wrote: > To me, the difference between system-level guardrails and table-level > constraints is the difference between *operational* concerns (guardrails) > and *business* concerns (table-level constraints). The two things are > only related to one another because they both may limit the value of a > field in some way, and there are some limited interactions between the two, > but otherwise are essentially unrelated, and *both* have *independent* > value. > > I absolutely agree that trying to make *configuration* somehow > transactional/cluster-wide vs. depending on operators to properly configure > yaml files on each node is a useful feature. It is, however, a much broader > conversation than just guardrails / constraints, and I don’t think the lack > of a solution to the “operator misconfigured node X and it has different > settings than node Y" in any way decreases the value of a table-level > constraint system for enforcing use-case-specific constraints on the data > in a table. > > The CEP does say: > > Interaction with Guardrails Framework > > As we mentioned in the motivation section, we currently have some > guardrails for columns size in place which can be extended for other data > types. > Those guardrails will take preference over the defined constraints in the > schema, and a SCHEMA ALTER adding constraints that break the limits defined > by the guardrails framework will fail. > If the guardrails themselves are modified, operator should get a warning > mentioning that there are schemas with offending constraints. > > > Other than throwing warnings in the logs, I’m not sure how exactly you’d > warn the operator that there are schemas w/ offending constraints… but I > suppose that would be enough. Given they are settable via JMX, I suppose > any time you set one of them it’ll have to scan every constraint definition > to make sure it doesn’t somehow violate the new guardrail value, which may > require some additional interaction between the two systems, but again, it > seems like this would be unrelated to *where* the configuration comes > from, and we should be able to isolate it in the initial CEP-42 > implementation. > > In summary, I'm not seeing how the new constraint framework would require > significant changes if the guardrails system, and configuration more > generally, was rewritten to somehow provide a consistent view of the > configuration across the cluster. In fact, the implementation of > Guardrails, already has a “configuration provider” that, by default, > happens to wrap the Yaml, but otherwise could pull configuration from other > sources, so it’s already fairly insulated from configuration *storage*, > which should make changing the underlying storage to something cluster-wide > a fairly isolated change. > > > Doug > > On Jun 6, 2024, at 2:12 PM, Štefan Miklošovič <stefan.mikloso...@gmail.com> > wrote: > > OK so let's modify that example like this: > > T0 - a node is started with no guardrails set > T1 - guardrail is set via JMX to not allow anything bigger than size of 10 > (whatever size means) > T2 - a user creates a table with a constraint that anything bigger than > size of 8 is forbidden > T3 - a user inserts a mutation with size of 7 > T4 - node is restarted and guardrail in cassandra.yaml is set to forbid > sizes bigger than 5 > T5 - mutation with size of 7 is replayed from FQL and it will fail to > replay it because of "global guardrail" in yaml > > In general, the problem I see with this CEP is that I feel like we clearly > see that it is a little bit hairy around the configuration and it _can_ be > broken or misconfigured etc but the feedback I see is that "yeah but ... it > is possible to break it already, so what?" > > I do not follow this logic. If we see that it "leaks", why is the leakage > an excuse to put more features on top of that? Should not we fix the > leakage in the first place? Why is that an excuse? I don't get that ... It > is like "yeah it is broken so by putting more stuff on top of that it can't > be worse". > > What if we focused our effort to make configuration transactional etc or > at least tried to fix this problem so it does not happen? If we do not do > that before we introduce this, then we will have more work to do once we go > to address that but it might be probably too late because we will need to > live with all our decisions made earlier, whatever ineffective they might > be. > > > > On Thu, Jun 6, 2024 at 7:33 PM Yifan Cai <yc25c...@gmail.com> wrote: > >> Hi Stefan, >> >> Thanks for putting the FQL example! However, it seems to be incorrect. >> FQL only records the _successful_ queries. The query at T4 fails, and it >> will not be included in FQL log. >> I do agree that changing guardrails on the fly can cause confusion when >> FQL is enabled on the node. Operator should probably avoid doing so. But it >> seems unrelated with contraints. Besides, there are value size guardrails, >> i.e. columnValueSize and collectionSize, available in Cassandra already. >> >> On extensibility, I agree that the CEP should make it clear what >> constraints are included and how they work. My understanding is that it >> wants to have size check and value check, which are useful for most cases. >> >> - Yifan >> >> On Thu, Jun 6, 2024 at 9:25 AM Štefan Miklošovič < >> stefan.mikloso...@gmail.com> wrote: >> >>> Another problem with this constraints feature is that if it does not >>> solely rely on constraints in CQL, then it would be non-deterministic if we >>> want to replay all mutations from a fql log. >>> >>> Let's take this into consideration (T = time) >>> >>> T0 - a node is started with no guardrails set >>> T1 - guardrail is set via JMX to not allow anything bigger than size of >>> 10 (whatever size means) >>> T2 - a user creates a table with a constraint that anything bigger than >>> size of 8 is forbidden >>> T3 - a user inserts a mutation with size of 5 >>> T4 - a user modifies a table to set the constraint in such a way that >>> anything bigger than size of 15 is forbidden - this will fail because we >>> have a guardrail that anything bigger than 10 is forbidden from T1. >>> >>> Then we gather FQL log and restart the node, as guardrails do not >>> survive restarts for now, when we replay, then T4 will be replayed too but >>> it should not be. >>> >>> Is this correct? >>> >>> On Thu, Jun 6, 2024 at 9:49 AM Štefan Miklošovič < >>> stefan.mikloso...@gmail.com> wrote: >>> >>>> I agree with Jon that a detailed description of all constraints to be >>>> introduced is necessary. Only to say that it will be extensible so we can >>>> add other constraints later is not enough. What other constraints? >>>> >>>> On Thu, Jun 6, 2024 at 6:24 AM Jon Haddad <j...@jonhaddad.com> wrote: >>>> >>>>> I think there's some promising ideas here, but the CEP needs to be >>>>> developed a bit more. >>>>> >>>>> > Another types of constraints and functions can be added in the >>>>> future to provide even more flexibility, but are out of the scope of this >>>>> CEP. >>>>> >>>>> > For the third point, I didn’t want to be prescriptive on what those >>>>> validations should be, but the fact that the proposal is extensible to >>>>> those potential use cases is something concrete that, in my opinion, comes >>>>> as a benefit of the actual proposal. I’d be happy to develop a bit more >>>>> the >>>>> main example used of sizeOf if it helps alleviate your concerns on this >>>>> point. >>>>> >>>>> I disagree, quite strongly, with this. While I appreciate >>>>> extensibility, I think having a variety of actual constraints that ship >>>>> with the feature means it needs to be built to satisfy real world use >>>>> cases. Without going through this process, it feels a bit too much like >>>>> triggers, UDAs and UDFs - incomplete, and too much left to the end user. >>>>> >>>>> To me, punting on thinking through constraints kicks the most >>>>> important can down the road. >>>>> >>>>> Jon >>>>> >>>>> >>>>> On Tue, Jun 4, 2024 at 5:37 PM Bernardo Botella < >>>>> conta...@bernardobotella.com> wrote: >>>>> >>>>>> In the CEP document there is another example (altho not explicetly >>>>>> mentioned) adding a constraint to the max value of an int -> >>>>>> `number_of_items int CONSTRAINT number_of_items < 1000` >>>>>> >>>>>> This basic example can also be used to expand on how to extend this >>>>>> functionality with these two initial constraints (size and value), by >>>>>> composing them to create new data types with proper validation. >>>>>> >>>>>> For example, this could create an ipv4 with built in validation: >>>>>> CREATE TYPE keyspace.cidr_address_ipv4 ( >>>>>> ip_adress inet, >>>>>> subnet_mask int, >>>>>> CONSTRAINT subnet_mask > 0, >>>>>> CONSTRAINT subnet_mask < 32 >>>>>> ) >>>>>> >>>>>> Or a color type: >>>>>> CREATE TYPE keyspace.color ( >>>>>> r int, >>>>>> g int, >>>>>> b int, >>>>>> CONSTRAINT r >= 0, >>>>>> CONSTRAINT r < 255, >>>>>> CONSTRAINT g >= 0, >>>>>> CONSTRAINT g < 255, >>>>>> CONSTRAINT b >= 0, >>>>>> CONSTRAINT b < 255, >>>>>> ) >>>>>> >>>>>> >>>>>> Another types of constraints and functions can be added in the future >>>>>> to provide even more flexibility, but are out of the scope of this CEP. >>>>>> >>>>>> Bernardo >>>>>> >>>>>> On Jun 4, 2024, at 1:01 PM, Jon Haddad <j...@jonhaddad.com> wrote: >>>>>> >>>>>> The idea is interesting. I think it would help to have more concrete >>>>>> examples. It's a bit sparse at the moment, and I have a hard time >>>>>> getting >>>>>> on board with new features where the main selling point is Extensibility >>>>>> over the value they provide on their own. >>>>>> >>>>>> I think it would help a lot if we knew what types of constraints, >>>>>> besides the size check, you were thinking of adding. >>>>>> >>>>>> Jon >>>>>> >>>>>> On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella < >>>>>> conta...@bernardobotella.com> wrote: >>>>>> >>>>>>> Yes, that is correct. This particular behavior will need CEP-24 in >>>>>>> order to work reliably. But, if my understanding is correct, that >>>>>>> statement >>>>>>> holds true for the entirety of Guardrails, and not only for this >>>>>>> particular >>>>>>> feature. >>>>>>> >>>>>>> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan < >>>>>>> stefan.mikloso...@netapp.com> wrote: >>>>>>> >>>>>>> That would work reliably in case there is no way how to misconfigure >>>>>>> guardrails in the cluster. What if you set a guardrail on one node but >>>>>>> you >>>>>>> don’t set it (or set it differently) on the other? If it is configured >>>>>>> differently and you want to check the guardrails if constraints do not >>>>>>> violate them, then your query might fail or not based on what node is >>>>>>> hit. >>>>>>> >>>>>>> I guess that guardrails would need to start to be transactional to >>>>>>> be sure this is avoided and guardrails are indeed same everywhere >>>>>>> (CEP-24 >>>>>>> thread sent recently here in ML). >>>>>>> >>>>>>> >>>>>>> >>>>>>> *From: *Bernardo Botella <conta...@bernardobotella.com> >>>>>>> *Date: *Tuesday, 4 June 2024 at 00:31 >>>>>>> *To: *dev@cassandra.apache.org <dev@cassandra.apache.org> >>>>>>> *Cc: *Miklosovic, Stefan <stefan.mikloso...@netapp.com> >>>>>>> *Subject: *Re: [DISCUSS] CEP-42: Constraints Framework >>>>>>> You don't often get email from conta...@bernardobotella.com. Learn >>>>>>> why this is important >>>>>>> <https://aka.ms/LearnAboutSenderIdentification> >>>>>>> >>>>>>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments * >>>>>>> >>>>>>> >>>>>>> >>>>>>> Basically, I am trying to protect the limits set by the operator >>>>>>> against misconfigured schemas from the customers. >>>>>>> >>>>>>> I see the guardrails as a safety limit added by the operator, >>>>>>> setting the limits within the customers owning the actual schema (and >>>>>>> their >>>>>>> constraints) can operate. With that vision, if a customer tries to >>>>>>> “ignore” >>>>>>> the actual limits set by the operator by adding more relaxed >>>>>>> constraints, >>>>>>> it gets a nice message saying that “that is not allowed for the cluster, >>>>>>> please contact your admin". >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Jun 3, 2024, at 2:51 PM, Miklosovic, Stefan via dev < >>>>>>> dev@cassandra.apache.org> wrote: >>>>>>> >>>>>>> You wrote in the CEP: >>>>>>> >>>>>>> As we mentioned in the motivation section, we currently have some >>>>>>> guardrails for columns size in place which can be extended for other >>>>>>> data >>>>>>> types. >>>>>>> Those guardrails will take preference over the defined constraints >>>>>>> in the schema, and a SCHEMA ALTER adding constraints that break the >>>>>>> limits >>>>>>> defined by the guardrails framework will fail. >>>>>>> If the guardrails themselves are modified, operator should get a >>>>>>> warning mentioning that there are schemas with offending constraints. >>>>>>> >>>>>>> I think that this should be other way around. Guardrails should kick >>>>>>> in when there are no constraints and they would be overridden by table >>>>>>> schema. That way, there is always a “default” in terms of guardrails >>>>>>> (which >>>>>>> one can turn off on demand / change) but you can override it by table >>>>>>> alternation. >>>>>>> >>>>>>> Basically, what is in schema should win regardless of how guardrails >>>>>>> are configured. They don’t matter when a constraint is explicitly >>>>>>> specified >>>>>>> in a schema. It should take the defaults in guardrails if there are any >>>>>>> and >>>>>>> no constraint is specified on schema level. >>>>>>> >>>>>>> What is your motivation to do it like you suggested? >>>>>>> >>>>>>> >>>>>>> *From: *Bernardo Botella <conta...@bernardobotella.com> >>>>>>> *Date: *Friday, 31 May 2024 at 23:24 >>>>>>> *To: *dev@cassandra.apache.org <dev@cassandra.apache.org> >>>>>>> *Subject: *[DISCUSS] CEP-42: Constraints Framework >>>>>>> You don't often get email from conta...@bernardobotella.com. Learn >>>>>>> why this is important >>>>>>> <https://aka.ms/LearnAboutSenderIdentification> >>>>>>> >>>>>>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments * >>>>>>> >>>>>>> >>>>>>> Hello everyone, >>>>>>> >>>>>>> I am proposing this CEP: >>>>>>> CEP-42: Constraints Framework - CASSANDRA - Apache Software >>>>>>> Foundation >>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework> >>>>>>> cwiki.apache.org >>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework> >>>>>>> <favicon.ico> >>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework> >>>>>>> >>>>>>> >>>>>>> And I’m looking for feedback from the community. >>>>>>> >>>>>>> Thanks a lot! >>>>>>> Bernardo >>>>>>> >>>>>>> >>>>>>> >>>>>> >