Re: Constraint's "not null" alignment with transactions and their simplification

Josh McKenzie Tue, 15 Apr 2025 13:46:07 -0700

> If we have a goal of eventually providing ANSI SQL support one day, we should 
> at least stick to the ANSI SQL standard where applicable for features in the 
> meantime.
Do we collectively have that goal? Not disagreeing with it at all, genuinely 
curious.


Broadly I agree that we should definitely keep ourselves *compatible *with ANSI 
SQL where possible so as not to paint ourselves into a corner nor alienate our 
users, but I do believe there's value in having a consistent UX while 
acknowledging and accommodating that things aren't necessarily consistent in 
the SQL space.

Are you against offering a superset of that standard (i.e. CHECK on all 
constraints a supported form, ANSI SQL standard on things like IS NOT NULL etc)?

On Tue, Apr 15, 2025, at 3:05 AM, Benedict wrote:
> 
> I am not certain if the ANSI SQL standard requires that any expression be 
> supported, but either way it is much better to implement a strict subset of a 
> feature than to implement an overlapping feature we can never synchronise 
> with the standard.
> 
> Accord is also capable of imposing multi column expressions without 
> constraining what user queries can do.
> 
> 
> 
>> On 15 Apr 2025, at 02:10, David Capwell <[email protected]> wrote:
>> One argument I have against using columns in the constraint is that we 
>> currently are not able to actually handle the CHECK spec in SQL.  In SQL the 
>> CHECK expression is a valid search expression (eg. price > 42 AND tag = 
>> ‘foo’), and is limited to the current row.
>> 
>> Where we differ in Apache Cassandra
>> 
>> 1) the mutation might not include the columns referenced in the search 
>> expression: price numeric CHECK (name IS NOT NULL) (requires each mutation 
>> to define all columns referenced in any constraint)
>> 2) our current checks are not normal search expressions (that limit 
>> functions to deterministic ones), its a allowed listed set of expressions 
>> (fixable)
>> 
>> If we actually want to be the SQL version of constraints then we need to put 
>> more restrictions on what people can do.
>> 
>> For example
>> 
>> price numeric CHECK (length(name) > 42)
>> 
>> This constraint can only be handled if both price and name are present in 
>> the query, so we would need to force all mutations to have price and name, 
>> as we wouldn’t know the value of name in the following query
>> 
>> INSERT INTO tbl (pk, ck, price) VALUES (0, 0, 0);
>> 
>> So, do we put such a limitation on users and force them to define every 
>> column present in a constraint, or do we break the SQL spec?  If we choose 
>> to break the spec, then why allow column names in the search expressions?  
>> Only “this” column is safe
>> 
>>> On Apr 14, 2025, at 1:24 PM, Benedict <[email protected]> wrote:
>>> 
>>> 
>>> If we have a goal of eventually providing ANSI SQL support one day, we 
>>> should at least stick to the ANSI SQL standard where applicable for 
>>> features in the meantime. AFAICT the reason everyone else does this the 
>>> same is it is part of the standard. So, I am more than happy to stick to 
>>> the CHECK qualifier for all of our unique extensions, but for NOT NULL we 
>>> should absolutely follow the SQL standard, and for the same reason we 
>>> should use the field name in the CHECK expression.
>>> 
>>> 
>>>> On 14 Apr 2025, at 21:10, Josh McKenzie <[email protected]> wrote:
>>>> 
>>>> Consistency *within* our own ecosystem supersedes consistency with other 
>>>> familiar ecosystems IMO.
>>>> 
>>>> I'd prefer we consistently apply the CHECK keyword and don't have special 
>>>> cases that omit it, or perhaps have those as optional syntactic sugar but 
>>>> at its base the syntax is uniform and consistent.
>>>> 
>>>> On Mon, Apr 14, 2025, at 3:31 PM, Štefan Miklošovič wrote:
>>>>> To be honest, I agree with Bernardo here.
>>>>> 
>>>>> The requirement to have "val int not null" makes sense at first, but it 
>>>>> is not so straightforward.
>>>>> 
>>>>> I think that what we are trying to do here is to "copy like they have it" 
>>>>> because ... well, because they have it like that. And we do not want to 
>>>>> differ too much, because we like adoption and if it is easier for 
>>>>> newcomers who are used to some style already, better to support it as 
>>>>> well in this case because, why not, right?
>>>>> 
>>>>> Yeah, right ... but we are in a different position than SQL 
>>>>> implementation. I already said it, but I think that what happened first 
>>>>> was that they started to support "val int not null" and then they had a 
>>>>> need to support checks as well, so they slapped "val int not null check 
>>>>> ..." on that. They could not start from scratch, but we can. So not 
>>>>> supporting "val int not null" in favor of "val int check not null" makes 
>>>>> more sense to me. So what if it is different? It is at least consistent.
>>>>> 
>>>>> This is a broader discussion to have though. AFAIK, we do not have any 
>>>>> official policy which would command us to follow SQL as closely as 
>>>>> possible. We have never agreed on following it like SQL has it 
>>>>> officially. So "doing it like they have it" as it most a "nice to have" 
>>>>> but not a strict requirement.
>>>>> 
>>>>> On the other hand, if we ever codified that "we are following SQL" then 
>>>>> everything would need to follow it to the t. Not only constraints, every 
>>>>> other new feature which would change CQL would need to behave like that.
>>>>> 
>>>>> We do not have this policy anywhere afaik so when it comes to that I do 
>>>>> not think that supporting "val int not null" is necessary. I think that 
>>>>> their mixture of two styles is actually wrong and misleading. Having one 
>>>>> way to specify it with "val int check ..." is just a superior solution 
>>>>> here imho.
>>>>> 
>>>>> On Mon, Apr 14, 2025 at 9:14 PM Bernardo Botella 
>>>>> <[email protected]> wrote:
>>>>>> Now this is becoming a really interesting discussion. Thanks everyone 
>>>>>> for pitching in!
>>>>>> 
>>>>>> Here is my take on some of the proposed changes:
>>>>>> 
>>>>>> We are talking about treating some constraints (NOT_NULL, JSON) as 
>>>>>> special cases by omitting the CHECK keyword (not reserved as per current 
>>>>>> implementation). Now, while this may seem like a nice approach to 
>>>>>> feature gaps on our CQL, it really worries me that by doing so we open 
>>>>>> the door to not needed complexity both at implementation and 
>>>>>> conceptually with the constraints framework. 
>>>>>> 
>>>>>> In my mind, what’s the constraints framework? It is a simple and really 
>>>>>> easy to extend integration point for validators for a row. (LENGHT, 
>>>>>> SCALAR, REGEX, are really good examples of it).
>>>>>> 
>>>>>> What’s NOT the responsibility of the constraints framework? I don’t 
>>>>>> think this should be used to deliver partial solutions to feature gaps 
>>>>>> on CQL data modeling. Let’s take JSON constraint as an example. In the 
>>>>>> constraints case, it is as simple as checking that the provided string 
>>>>>> is valid json. Easy. Simple. But, how would JSON look like if it was a 
>>>>>> first class citizen in CQL? Setting the grammar aside, it would be 
>>>>>> handled differently probably. Things like: Can we store it better? Do we 
>>>>>> allow making queries for fields inside the json blob? Are there any 
>>>>>> optimizations that can be done when serializing/deserializing it? All of 
>>>>>> those definitely fall out of the scope of the constraints framework. So, 
>>>>>> I guess the question then becomes, is the JSON constraint a valid 
>>>>>> constraint to have? Just a temporal patch until (if) JSON type is in? 
>>>>>> Should we just remove it and keep ignoring JSON? Those are valid 
>>>>>> questions and discussions to have. But, I really think that we shouldn’t 
>>>>>> see this simple validator as a full fledged, first class citizen, type 
>>>>>> in CQL. Similar arguments could be have for the NOT_NULL constraint that 
>>>>>> has spawned so many interesting conversations.
>>>>>> 
>>>>>> Now, having made that distinction, I don’t think we should have 
>>>>>> constraints that can be defined differently on the CQL statement. They 
>>>>>> should all have a CHECK keyword, specifying that they are a constraint 
>>>>>> that will be checked (aka, row value will be validated against whatever 
>>>>>> function). That’s easy to identify, and it’s conceptually easy to 
>>>>>> understand the limitations it comes with (as opposed to the JSON example 
>>>>>> mentioned above).
>>>>>> 
>>>>>> Bernardo
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Apr 14, 2025, at 10:53 AM, Štefan Miklošovič 
>>>>>>> <[email protected]> wrote:
>>>>>>> 
>>>>>>> As Yifan already said, "check" is not a reserved word now and its usage 
>>>>>>> does not collide with anything.
>>>>>>> 
>>>>>>> If people have columns, tables, keyspaces with name "check" that seems 
>>>>>>> to work already so they don't need to do anything:
>>>>>>> 
>>>>>>> CREATE TABLE ks.tb (id int check id > 0, val int check val > 0, primary 
>>>>>>> key (id));
>>>>>>> 
>>>>>>> ALTER TABLE ks.tb ADD check int check check > 0;
>>>>>>> 
>>>>>>> DESCRIBE ks.tb;
>>>>>>> 
>>>>>>> CREATE TABLE ks.tb (
>>>>>>>     id int CHECK id > 0 PRIMARY KEY,
>>>>>>>     check int CHECK check > 0,
>>>>>>>     val int CHECK val > 0
>>>>>>> ) ....
>>>>>>> 
>>>>>>> CREATE TABLE ks.check (id int check id > 0, check int check check > 0, 
>>>>>>> primary key (id));
>>>>>>> CREATE KEYSPACE check WITH replication = {'class': 'SimpleStrategy', 
>>>>>>> 'replication_factor': 1};
>>>>>>> CREATE TABLE check.check (check int check check > 0, val int check val 
>>>>>>> > 0, primary key (check));
>>>>>>> INSERT INTO check.check (check , val ) VALUES ( 1, 1);
>>>>>>> 
>>>>>>> PostgreSQL has this:
>>>>>>> 
>>>>>>> CREATE TABLE products (
>>>>>>>     product_no integer,
>>>>>>>     name text,
>>>>>>>     price numeric CHECK (price > 0)
>>>>>>> );
>>>>>>> 
>>>>>>> we follow this approach (minus parenthesis). We can also chain 
>>>>>>> constraints whatever we like
>>>>>>> 
>>>>>>> val int CHECK val > 0 and age < 100
>>>>>>> 
>>>>>>> We can make a stab in trying to model
>>>>>>> 
>>>>>>> val int not null check val > 0
>>>>>>> 
>>>>>>> this is how PostgreSQL has it (1).
>>>>>>> 
>>>>>>> but that would be more complicated on the implementation side because 
>>>>>>> we would need to also accommodate "CQL describe" to dump it like that, 
>>>>>>> plus I am not sure how complicated it would be to tweak the parser as 
>>>>>>> well.
>>>>>>> 
>>>>>>> I will try to make some progress and will report back.
>>>>>>> 
>>>>>>> Regards  
>>>>>>> 
>>>>>>> (1) 
>>>>>>> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-NOT-NULL
>>>>>>> 
>>>>>>> On Sun, Apr 13, 2025 at 6:49 PM Dinesh Joshi <[email protected]> wrote:
>>>>>>>> On Sun, Apr 13, 2025 at 9:24 AM Patrick McFadin <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>> I'm loving all the syntax discussion lately. It's a good debate and 
>>>>>>>>> essential for the project's future with a good developer experience.
>>>>>>>> 
>>>>>>>> +1
>>>>>>>>  
>>>>>>>>> On NULL. I've been asked this a million times by end users. Why is 
>>>>>>>>> there no "NOT NULL" in the schema?
>>>>>>>> 
>>>>>>>> I would've expected this to be in billions by now ;)
>>>>>>>>  
>>>>>>>>> I'm in favor of the standard SQL syntax here because it's what users 
>>>>>>>>> have been using forever: 
>>>>>>>>> name       text NOT NULL
>>>>>>>> 
>>>>>>>> I hold a weak opinion on this. We don't have to necessarily align on 
>>>>>>>> to the standard SQL syntax. In my experience, users subconsciously 
>>>>>>>> feel Cassandra is a SQL database and try to design their schema to fit 
>>>>>>>> the traditional SQL / RDBMS design and then later are disappointed to 
>>>>>>>> find out it doesn't have joins or foreign key constraints. But that's 
>>>>>>>> just my personal experience working with users. However, I think we 
>>>>>>>> should strive for consistency and if it aligns with SQL I have no 
>>>>>>>> issues with the syntax. Just sharing my experience.
>>>>>>>>  
>>>>>>>>> On CHECK. Also in favor of making this a reserved word but in 
>>>>>>>>> context. Namely, how Postgres SQL works. CHECK ( 
>>>>>>>>> boolean_expression_on_column)
>>>>>>>> 
>>>>>>>> Making CHECK a reserved keyword may result in issues that Scott 
>>>>>>>> described. It will present a substantial barrier for users to upgrade 
>>>>>>>> as applications will have to be updated.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>> Dinesh
>>>>

Re: Constraint's "not null" alignment with transactions and their simplification

Reply via email to