> If we have a goal of eventually providing ANSI SQL support one day, we should > at least stick to the ANSI SQL standard where applicable for features in the > meantime. Do we collectively have that goal? Not disagreeing with it at all, genuinely curious.
Broadly I agree that we should definitely keep ourselves *compatible *with ANSI SQL where possible so as not to paint ourselves into a corner nor alienate our users, but I do believe there's value in having a consistent UX while acknowledging and accommodating that things aren't necessarily consistent in the SQL space. Are you against offering a superset of that standard (i.e. CHECK on all constraints a supported form, ANSI SQL standard on things like IS NOT NULL etc)? On Tue, Apr 15, 2025, at 3:05 AM, Benedict wrote: > > I am not certain if the ANSI SQL standard requires that any expression be > supported, but either way it is much better to implement a strict subset of a > feature than to implement an overlapping feature we can never synchronise > with the standard. > > Accord is also capable of imposing multi column expressions without > constraining what user queries can do. > > > >> On 15 Apr 2025, at 02:10, David Capwell <dcapw...@apple.com> wrote: >> One argument I have against using columns in the constraint is that we >> currently are not able to actually handle the CHECK spec in SQL. In SQL the >> CHECK expression is a valid search expression (eg. price > 42 AND tag = >> ‘foo’), and is limited to the current row. >> >> Where we differ in Apache Cassandra >> >> 1) the mutation might not include the columns referenced in the search >> expression: price numeric CHECK (name IS NOT NULL) (requires each mutation >> to define all columns referenced in any constraint) >> 2) our current checks are not normal search expressions (that limit >> functions to deterministic ones), its a allowed listed set of expressions >> (fixable) >> >> If we actually want to be the SQL version of constraints then we need to put >> more restrictions on what people can do. >> >> For example >> >> price numeric CHECK (length(name) > 42) >> >> This constraint can only be handled if both price and name are present in >> the query, so we would need to force all mutations to have price and name, >> as we wouldn’t know the value of name in the following query >> >> INSERT INTO tbl (pk, ck, price) VALUES (0, 0, 0); >> >> So, do we put such a limitation on users and force them to define every >> column present in a constraint, or do we break the SQL spec? If we choose >> to break the spec, then why allow column names in the search expressions? >> Only “this” column is safe >> >>> On Apr 14, 2025, at 1:24 PM, Benedict <bened...@apache.org> wrote: >>> >>> >>> If we have a goal of eventually providing ANSI SQL support one day, we >>> should at least stick to the ANSI SQL standard where applicable for >>> features in the meantime. AFAICT the reason everyone else does this the >>> same is it is part of the standard. So, I am more than happy to stick to >>> the CHECK qualifier for all of our unique extensions, but for NOT NULL we >>> should absolutely follow the SQL standard, and for the same reason we >>> should use the field name in the CHECK expression. >>> >>> >>>> On 14 Apr 2025, at 21:10, Josh McKenzie <jmcken...@apache.org> wrote: >>>> >>>> Consistency *within* our own ecosystem supersedes consistency with other >>>> familiar ecosystems IMO. >>>> >>>> I'd prefer we consistently apply the CHECK keyword and don't have special >>>> cases that omit it, or perhaps have those as optional syntactic sugar but >>>> at its base the syntax is uniform and consistent. >>>> >>>> On Mon, Apr 14, 2025, at 3:31 PM, Štefan Miklošovič wrote: >>>>> To be honest, I agree with Bernardo here. >>>>> >>>>> The requirement to have "val int not null" makes sense at first, but it >>>>> is not so straightforward. >>>>> >>>>> I think that what we are trying to do here is to "copy like they have it" >>>>> because ... well, because they have it like that. And we do not want to >>>>> differ too much, because we like adoption and if it is easier for >>>>> newcomers who are used to some style already, better to support it as >>>>> well in this case because, why not, right? >>>>> >>>>> Yeah, right ... but we are in a different position than SQL >>>>> implementation. I already said it, but I think that what happened first >>>>> was that they started to support "val int not null" and then they had a >>>>> need to support checks as well, so they slapped "val int not null check >>>>> ..." on that. They could not start from scratch, but we can. So not >>>>> supporting "val int not null" in favor of "val int check not null" makes >>>>> more sense to me. So what if it is different? It is at least consistent. >>>>> >>>>> This is a broader discussion to have though. AFAIK, we do not have any >>>>> official policy which would command us to follow SQL as closely as >>>>> possible. We have never agreed on following it like SQL has it >>>>> officially. So "doing it like they have it" as it most a "nice to have" >>>>> but not a strict requirement. >>>>> >>>>> On the other hand, if we ever codified that "we are following SQL" then >>>>> everything would need to follow it to the t. Not only constraints, every >>>>> other new feature which would change CQL would need to behave like that. >>>>> >>>>> We do not have this policy anywhere afaik so when it comes to that I do >>>>> not think that supporting "val int not null" is necessary. I think that >>>>> their mixture of two styles is actually wrong and misleading. Having one >>>>> way to specify it with "val int check ..." is just a superior solution >>>>> here imho. >>>>> >>>>> On Mon, Apr 14, 2025 at 9:14 PM Bernardo Botella >>>>> <conta...@bernardobotella.com> wrote: >>>>>> Now this is becoming a really interesting discussion. Thanks everyone >>>>>> for pitching in! >>>>>> >>>>>> Here is my take on some of the proposed changes: >>>>>> >>>>>> We are talking about treating some constraints (NOT_NULL, JSON) as >>>>>> special cases by omitting the CHECK keyword (not reserved as per current >>>>>> implementation). Now, while this may seem like a nice approach to >>>>>> feature gaps on our CQL, it really worries me that by doing so we open >>>>>> the door to not needed complexity both at implementation and >>>>>> conceptually with the constraints framework. >>>>>> >>>>>> In my mind, what’s the constraints framework? It is a simple and really >>>>>> easy to extend integration point for validators for a row. (LENGHT, >>>>>> SCALAR, REGEX, are really good examples of it). >>>>>> >>>>>> What’s NOT the responsibility of the constraints framework? I don’t >>>>>> think this should be used to deliver partial solutions to feature gaps >>>>>> on CQL data modeling. Let’s take JSON constraint as an example. In the >>>>>> constraints case, it is as simple as checking that the provided string >>>>>> is valid json. Easy. Simple. But, how would JSON look like if it was a >>>>>> first class citizen in CQL? Setting the grammar aside, it would be >>>>>> handled differently probably. Things like: Can we store it better? Do we >>>>>> allow making queries for fields inside the json blob? Are there any >>>>>> optimizations that can be done when serializing/deserializing it? All of >>>>>> those definitely fall out of the scope of the constraints framework. So, >>>>>> I guess the question then becomes, is the JSON constraint a valid >>>>>> constraint to have? Just a temporal patch until (if) JSON type is in? >>>>>> Should we just remove it and keep ignoring JSON? Those are valid >>>>>> questions and discussions to have. But, I really think that we shouldn’t >>>>>> see this simple validator as a full fledged, first class citizen, type >>>>>> in CQL. Similar arguments could be have for the NOT_NULL constraint that >>>>>> has spawned so many interesting conversations. >>>>>> >>>>>> Now, having made that distinction, I don’t think we should have >>>>>> constraints that can be defined differently on the CQL statement. They >>>>>> should all have a CHECK keyword, specifying that they are a constraint >>>>>> that will be checked (aka, row value will be validated against whatever >>>>>> function). That’s easy to identify, and it’s conceptually easy to >>>>>> understand the limitations it comes with (as opposed to the JSON example >>>>>> mentioned above). >>>>>> >>>>>> Bernardo >>>>>> >>>>>> >>>>>> >>>>>>> On Apr 14, 2025, at 10:53 AM, Štefan Miklošovič >>>>>>> <smikloso...@apache.org> wrote: >>>>>>> >>>>>>> As Yifan already said, "check" is not a reserved word now and its usage >>>>>>> does not collide with anything. >>>>>>> >>>>>>> If people have columns, tables, keyspaces with name "check" that seems >>>>>>> to work already so they don't need to do anything: >>>>>>> >>>>>>> CREATE TABLE ks.tb (id int check id > 0, val int check val > 0, primary >>>>>>> key (id)); >>>>>>> >>>>>>> ALTER TABLE ks.tb ADD check int check check > 0; >>>>>>> >>>>>>> DESCRIBE ks.tb; >>>>>>> >>>>>>> CREATE TABLE ks.tb ( >>>>>>> id int CHECK id > 0 PRIMARY KEY, >>>>>>> check int CHECK check > 0, >>>>>>> val int CHECK val > 0 >>>>>>> ) .... >>>>>>> >>>>>>> CREATE TABLE ks.check (id int check id > 0, check int check check > 0, >>>>>>> primary key (id)); >>>>>>> CREATE KEYSPACE check WITH replication = {'class': 'SimpleStrategy', >>>>>>> 'replication_factor': 1}; >>>>>>> CREATE TABLE check.check (check int check check > 0, val int check val >>>>>>> > 0, primary key (check)); >>>>>>> INSERT INTO check.check (check , val ) VALUES ( 1, 1); >>>>>>> >>>>>>> PostgreSQL has this: >>>>>>> >>>>>>> CREATE TABLE products ( >>>>>>> product_no integer, >>>>>>> name text, >>>>>>> price numeric CHECK (price > 0) >>>>>>> ); >>>>>>> >>>>>>> we follow this approach (minus parenthesis). We can also chain >>>>>>> constraints whatever we like >>>>>>> >>>>>>> val int CHECK val > 0 and age < 100 >>>>>>> >>>>>>> We can make a stab in trying to model >>>>>>> >>>>>>> val int not null check val > 0 >>>>>>> >>>>>>> this is how PostgreSQL has it (1). >>>>>>> >>>>>>> but that would be more complicated on the implementation side because >>>>>>> we would need to also accommodate "CQL describe" to dump it like that, >>>>>>> plus I am not sure how complicated it would be to tweak the parser as >>>>>>> well. >>>>>>> >>>>>>> I will try to make some progress and will report back. >>>>>>> >>>>>>> Regards >>>>>>> >>>>>>> (1) >>>>>>> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-NOT-NULL >>>>>>> >>>>>>> On Sun, Apr 13, 2025 at 6:49 PM Dinesh Joshi <djo...@apache.org> wrote: >>>>>>>> On Sun, Apr 13, 2025 at 9:24 AM Patrick McFadin <pmcfa...@gmail.com> >>>>>>>> wrote: >>>>>>>>> I'm loving all the syntax discussion lately. It's a good debate and >>>>>>>>> essential for the project's future with a good developer experience. >>>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>>> On NULL. I've been asked this a million times by end users. Why is >>>>>>>>> there no "NOT NULL" in the schema? >>>>>>>> >>>>>>>> I would've expected this to be in billions by now ;) >>>>>>>> >>>>>>>>> I'm in favor of the standard SQL syntax here because it's what users >>>>>>>>> have been using forever: >>>>>>>>> name text NOT NULL >>>>>>>> >>>>>>>> I hold a weak opinion on this. We don't have to necessarily align on >>>>>>>> to the standard SQL syntax. In my experience, users subconsciously >>>>>>>> feel Cassandra is a SQL database and try to design their schema to fit >>>>>>>> the traditional SQL / RDBMS design and then later are disappointed to >>>>>>>> find out it doesn't have joins or foreign key constraints. But that's >>>>>>>> just my personal experience working with users. However, I think we >>>>>>>> should strive for consistency and if it aligns with SQL I have no >>>>>>>> issues with the syntax. Just sharing my experience. >>>>>>>> >>>>>>>>> On CHECK. Also in favor of making this a reserved word but in >>>>>>>>> context. Namely, how Postgres SQL works. CHECK ( >>>>>>>>> boolean_expression_on_column) >>>>>>>> >>>>>>>> Making CHECK a reserved keyword may result in issues that Scott >>>>>>>> described. It will present a substantial barrier for users to upgrade >>>>>>>> as applications will have to be updated. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Dinesh >>>>