Re: Constraint's "not null" alignment with transactions and their simplification

Jon Haddad Wed, 16 Apr 2025 09:36:51 -0700

I think the world is fairly used to NOT NULL in line with the type.  I
don't see much value in paving our own path for things that are established
norms.  Let's not reinvent wheels that don't need reinventing.


I also think it's fine if the generated CQL from describe doesn't match the
user's input.  We already do this with all the table level fields that get
added.  I'm not sure why we should care if a user drifts away from not null
either.  I think these are pretty irrelevant details in the grand scheme of
things.

A bigger issue, is the actual behavior, in that you can say NOT NULL and
still easily get null values by not supplying the value.  The constraint is
more of a weakly applied suggestion.  I'm expecting a bit of confusion
around this.  I wonder if it makes sense to not allow prepared INSERT
statements that don't contain the field if it's null constrained, although
it doesn't really help us in the case of an upsert.  It does narrow the
edge case a bit, which could be argued is good or bad depending on
someone's mood.

I think it should be made clear that what we're really giving the user when
we say `value check not null` is this:

INSERT INTO blah (id, value) VALUES(?, ? not null);

Does accord change the behavior of INSERT to actually mean INSERT and not
UPSERT, or do you still have to specify IF NOT EXISTS?

Jon



On Wed, Apr 16, 2025 at 12:50 AM Štefan Miklošovič <[email protected]>
wrote:

> Maybe I am too naive and idealist but I can see a world where, even we
> enable them to do it like "val int not null", after they describe a table
> and they see "val int check not null" they start to use the latter form
> because they would be tired of switching two forms in their head all the
> time when they see that all other checks they want to specify have to be
> specified in "val int check ..." form. They might actually start to
> appreciate the unified fashion of it and they will themselves drift away
> from "val int not null" eventually. I would :D.
>
> As we are among leaders in NoSQL databases we have also a great leverage
> over how stuff is going to be used and we can actively form this space. We
> do not need to just follow.
>
> I am open to supporting specifying "val int not null" _together with_ "val
> int check not null" form. Not _exclusively_ SQL syntax only. This syntax
> sugar can go in anytime and it is more or less purely "additive". Even if a
> user specifies it like "val int not null" it will be internally stored as
> "val int check not null" to be consistent with how it is going to be
> represented internally with all other checks. NOT NULL will ultimately be a
> check as any other we have, it is just how it might be defined on the CQL
> level which we are arguing about here.
>
> On Wed, Apr 16, 2025 at 12:56 AM Patrick McFadin <[email protected]>
> wrote:
>
>> I may be to blame for some of that energy. :D
>>
>> No, we don't have consensus on that direction, but I think we will
>> eventually. CQL started out as a subset of SQL but has drifted because
>> features get added async and there just hasn't been any formal standard or
>> road map or lofty goal. We've been laying out track in front of us as we go
>> along. A couple of years ago, I had a conversation at ApacheCon, and it
>> started slowly rattling around my brain that the developer experience
>> destination is regression to a mean and not charting a bold new path.
>> Everything else in Cassandra was already charting plenty of new paths, but
>> if we give users a calm, familiar API, that feels like a winner. SQL is
>> already a standard and could be our roadmap.
>>
>> With all that, I've been getting back into the data modeling game and
>> seeing how a slightly different syntax can lead to angry users. In
>> conversations with users, the idea of being SQL compatible in syntax
>> (features can come whenever) is highly appealing. In reality, we are
>> talking about CQL4, but why create tech debt for later if we design
>> features now? My dream would be to have a compatibility test for Cassandra
>> SQL that ticks off what's left for implementation. This is probably a
>> DISCUSS thread but here we are now.
>>
>> Patrick
>>
>> On Tue, Apr 15, 2025 at 1:47 PM Josh McKenzie <[email protected]>
>> wrote:
>>
>>> If we have a goal of eventually providing ANSI SQL support one day, we
>>> should at least stick to the ANSI SQL standard where applicable for
>>> features in the meantime.
>>>
>>> Do we collectively have that goal? Not disagreeing with it at all,
>>> genuinely curious.
>>>
>>> Broadly I agree that we should definitely keep ourselves *compatible *with
>>> ANSI SQL where possible so as not to paint ourselves into a corner nor
>>> alienate our users, but I do believe there's value in having a consistent
>>> UX while acknowledging and accommodating that things aren't necessarily
>>> consistent in the SQL space.
>>>
>>> Are you against offering a superset of that standard (i.e. CHECK on all
>>> constraints a supported form, ANSI SQL standard on things like IS NOT NULL
>>> etc)?
>>>
>>> On Tue, Apr 15, 2025, at 3:05 AM, Benedict wrote:
>>>
>>>
>>> I am not certain if the ANSI SQL standard requires that any expression
>>> be supported, but either way it is much better to implement a strict subset
>>> of a feature than to implement an overlapping feature we can never
>>> synchronise with the standard.
>>>
>>> Accord is also capable of imposing multi column expressions without
>>> constraining what user queries can do.
>>>
>>>
>>>
>>> On 15 Apr 2025, at 02:10, David Capwell <[email protected]> wrote:
>>>
>>> One argument I have against using columns in the constraint is that we
>>> currently are not able to actually handle the CHECK spec in SQL.  In SQL
>>> the CHECK expression is a valid search expression (eg. price > 42 AND tag =
>>> ‘foo’), and is limited to the current row.
>>>
>>> Where we differ in Apache Cassandra
>>>
>>> 1) the mutation might not include the columns referenced in the search
>>> expression: price numeric CHECK (name IS NOT NULL) (requires each mutation
>>> to define all columns referenced in any constraint)
>>> 2) our current checks are not normal search expressions (that limit
>>> functions to deterministic ones), its a allowed listed set of expressions
>>> (fixable)
>>>
>>> If we actually want to be the SQL version of constraints then we need to
>>> put more restrictions on what people can do.
>>>
>>> For example
>>>
>>> price numeric CHECK (length(name) > 42)
>>>
>>> This constraint can only be handled if both price and name are present
>>> in the query, so we would need to force all mutations to have price and
>>> name, as we wouldn’t know the value of name in the following query
>>>
>>> INSERT INTO tbl (pk, ck, price) VALUES (0, 0, 0);
>>>
>>> So, do we put such a limitation on users and force them to define every
>>> column present in a constraint, or do we break the SQL spec?  If we
>>> choose to break the spec, then why allow column names in the search
>>> expressions?  Only “this” column is safe
>>>
>>> On Apr 14, 2025, at 1:24 PM, Benedict <[email protected]> wrote:
>>>
>>>
>>> If we have a goal of eventually providing ANSI SQL support one day, we
>>> should at least stick to the ANSI SQL standard where applicable for
>>> features in the meantime. AFAICT the reason everyone else does this the
>>> same is it is part of the standard. So, I am more than happy to stick to
>>> the CHECK qualifier for all of our unique extensions, but for NOT NULL we
>>> should absolutely follow the SQL standard, and for the same reason we
>>> should use the field name in the CHECK expression.
>>>
>>>
>>> On 14 Apr 2025, at 21:10, Josh McKenzie <[email protected]> wrote:
>>>
>>> 
>>> Consistency *within* our own ecosystem supersedes consistency with
>>> other familiar ecosystems IMO.
>>>
>>> I'd prefer we consistently apply the CHECK keyword and don't have
>>> special cases that omit it, or perhaps have those as optional syntactic
>>> sugar but at its base the syntax is uniform and consistent.
>>>
>>> On Mon, Apr 14, 2025, at 3:31 PM, Štefan Miklošovič wrote:
>>>
>>> To be honest, I agree with Bernardo here.
>>>
>>> The requirement to have "val int not null" makes sense at first, but it
>>> is not so straightforward.
>>>
>>> I think that what we are trying to do here is to "copy like they have
>>> it" because ... well, because they have it like that. And we do not want to
>>> differ too much, because we like adoption and if it is easier for newcomers
>>> who are used to some style already, better to support it as well in this
>>> case because, why not, right?
>>>
>>> Yeah, right ... but we are in a different position than SQL
>>> implementation. I already said it, but I think that what happened first was
>>> that they started to support "val int not null" and then they had a need to
>>> support checks as well, so they slapped "val int not null check ..." on
>>> that. They could not start from scratch, but we can. So not supporting "val
>>> int not null" in favor of "val int check not null" makes more sense to me.
>>> So what if it is different? It is at least consistent.
>>>
>>> This is a broader discussion to have though. AFAIK, we do not have any
>>> official policy which would command us to follow SQL as closely as
>>> possible. We have never agreed on following it like SQL has it officially.
>>> So "doing it like they have it" as it most a "nice to have" but not a
>>> strict requirement.
>>>
>>> On the other hand, if we ever codified that "we are following SQL" then
>>> everything would need to follow it to the t. Not only constraints, every
>>> other new feature which would change CQL would need to behave like that.
>>>
>>> We do not have this policy anywhere afaik so when it comes to that I do
>>> not think that supporting "val int not null" is necessary. I think that
>>> their mixture of two styles is actually wrong and misleading. Having one
>>> way to specify it with "val int check ..." is just a superior solution here
>>> imho.
>>>
>>> On Mon, Apr 14, 2025 at 9:14 PM Bernardo Botella <
>>> [email protected]> wrote:
>>>
>>> Now this is becoming a really interesting discussion. Thanks everyone
>>> for pitching in!
>>>
>>> Here is my take on some of the proposed changes:
>>>
>>> We are talking about treating some constraints (NOT_NULL, JSON) as
>>> special cases by omitting the CHECK keyword (not reserved as per current
>>> implementation). Now, while this may seem like a nice approach to feature
>>> gaps on our CQL, it really worries me that by doing so we open the door to
>>> not needed complexity both at implementation and conceptually with the
>>> constraints framework.
>>>
>>> In my mind, what’s the constraints framework? It is a simple and really
>>> easy to extend integration point for validators for a row. (LENGHT, SCALAR,
>>> REGEX, are really good examples of it).
>>>
>>> What’s NOT the responsibility of the constraints framework? I don’t
>>> think this should be used to deliver partial solutions to feature gaps on
>>> CQL data modeling. Let’s take JSON constraint as an example. In the
>>> constraints case, it is as simple as checking that the provided string is
>>> valid json. Easy. Simple. But, how would JSON look like if it was a first
>>> class citizen in CQL? Setting the grammar aside, it would be handled
>>> differently probably. Things like: Can we store it better? Do we allow
>>> making queries for fields inside the json blob? Are there any optimizations
>>> that can be done when serializing/deserializing it? All of those definitely
>>> fall out of the scope of the constraints framework. So, I guess the
>>> question then becomes, is the JSON constraint a valid constraint to have?
>>> Just a temporal patch until (if) JSON type is in? Should we just remove it
>>> and keep ignoring JSON? Those are valid questions and discussions to have.
>>> But, I really think that we shouldn’t see this simple validator as a full
>>> fledged, first class citizen, type in CQL. Similar arguments could be have
>>> for the NOT_NULL constraint that has spawned so many interesting
>>> conversations.
>>>
>>> Now, having made that distinction, I don’t think we should have
>>> constraints that can be defined differently on the CQL statement. They
>>> should all have a CHECK keyword, specifying that they are a constraint that
>>> will be checked (aka, row value will be validated against whatever
>>> function). That’s easy to identify, and it’s conceptually easy to
>>> understand the limitations it comes with (as opposed to the JSON example
>>> mentioned above).
>>>
>>> Bernardo
>>>
>>>
>>>
>>> On Apr 14, 2025, at 10:53 AM, Štefan Miklošovič <[email protected]>
>>> wrote:
>>>
>>> As Yifan already said, "check" is not a reserved word now and its usage
>>> does not collide with anything.
>>>
>>> If people have columns, tables, keyspaces with name "check" that seems
>>> to work already so they don't need to do anything:
>>>
>>> CREATE TABLE ks.tb (id int check id > 0, val int check val > 0, primary
>>> key (id));
>>>
>>> ALTER TABLE ks.tb ADD check int check check > 0;
>>>
>>> DESCRIBE ks.tb;
>>>
>>> CREATE TABLE ks.tb (
>>>     id int CHECK id > 0 PRIMARY KEY,
>>>     check int CHECK check > 0,
>>>     val int CHECK val > 0
>>> ) ....
>>>
>>> CREATE TABLE ks.check (id int check id > 0, check int check check > 0,
>>> primary key (id));
>>> CREATE KEYSPACE check WITH replication = {'class': 'SimpleStrategy',
>>> 'replication_factor': 1};
>>> CREATE TABLE check.check (check int check check > 0, val int check val >
>>> 0, primary key (check));
>>> INSERT INTO check.check (check , val ) VALUES ( 1, 1);
>>>
>>> PostgreSQL has this:
>>>
>>> CREATE TABLE products (
>>>     product_no integer,
>>>     name text,
>>>     price numeric CHECK (price > 0)
>>> );
>>>
>>> we follow this approach (minus parenthesis). We can also chain
>>> constraints whatever we like
>>>
>>> val int CHECK val > 0 and age < 100
>>>
>>> We can make a stab in trying to model
>>>
>>> val int not null check val > 0
>>>
>>> this is how PostgreSQL has it (1).
>>>
>>> but that would be more complicated on the implementation side because we
>>> would need to also accommodate "CQL describe" to dump it like that, plus I
>>> am not sure how complicated it would be to tweak the parser as well.
>>>
>>> I will try to make some progress and will report back.
>>>
>>> Regards
>>>
>>> (1)
>>> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-NOT-NULL
>>>
>>> On Sun, Apr 13, 2025 at 6:49 PM Dinesh Joshi <[email protected]> wrote:
>>>
>>> On Sun, Apr 13, 2025 at 9:24 AM Patrick McFadin <[email protected]>
>>> wrote:
>>>
>>> I'm loving all the syntax discussion lately. It's a good debate and
>>> essential for the project's future with a good developer experience.
>>>
>>>
>>> +1
>>>
>>>
>>> On NULL. I've been asked this a million times by end users. Why is there
>>> no "NOT NULL" in the schema?
>>>
>>>
>>> I would've expected this to be in billions by now ;)
>>>
>>>
>>> I'm in favor of the standard SQL syntax here because it's what users
>>> have been using forever:
>>> name       text NOT NULL
>>>
>>>
>>> I hold a weak opinion on this. We don't have to necessarily align on to
>>> the standard SQL syntax. In my experience, users subconsciously feel
>>> Cassandra is a SQL database and try to design their schema to fit the
>>> traditional SQL / RDBMS design and then later are disappointed to find out
>>> it doesn't have joins or foreign key constraints. But that's just my
>>> personal experience working with users. However, I think we should strive
>>> for consistency and if it aligns with SQL I have no issues with the syntax.
>>> Just sharing my experience.
>>>
>>>
>>> On CHECK. Also in favor of making this a reserved word but in context.
>>> Namely, how Postgres SQL works. CHECK ( boolean_expression_on_column)
>>>
>>>
>>> Making CHECK a reserved keyword may result in issues that Scott
>>> described. It will present a substantial barrier for users to upgrade as
>>> applications will have to be updated.
>>>
>>> Thanks,
>>>
>>> Dinesh
>>>
>>>
>>>
>>>

Re: Constraint's "not null" alignment with transactions and their simplification

Reply via email to