Re: Constraint's "not null" alignment with transactions and their simplification

Štefan Miklošovič Wed, 16 Apr 2025 23:41:59 -0700

_A bigger issue, is the actual behavior, in that you can say NOT NULL and
still easily get null values by not supplying the value.  The constraint is
more of a weakly applied suggestion._


This is not true. You can not really put there null. See other discussion
about this behaviour on ML.

On Wed, Apr 16, 2025 at 6:36 PM Jon Haddad <[email protected]> wrote:

> I think the world is fairly used to NOT NULL in line with the type.  I
> don't see much value in paving our own path for things that are established
> norms.  Let's not reinvent wheels that don't need reinventing.
>
> I also think it's fine if the generated CQL from describe doesn't match
> the user's input.  We already do this with all the table level fields that
> get added.  I'm not sure why we should care if a user drifts away from not
> null either.  I think these are pretty irrelevant details in the grand
> scheme of things.
>
> A bigger issue, is the actual behavior, in that you can say NOT NULL and
> still easily get null values by not supplying the value.  The constraint is
> more of a weakly applied suggestion.  I'm expecting a bit of confusion
> around this.  I wonder if it makes sense to not allow prepared INSERT
> statements that don't contain the field if it's null constrained, although
> it doesn't really help us in the case of an upsert.  It does narrow the
> edge case a bit, which could be argued is good or bad depending on
> someone's mood.
>
> I think it should be made clear that what we're really giving the user
> when we say `value check not null` is this:
>
> INSERT INTO blah (id, value) VALUES(?, ? not null);
>
> Does accord change the behavior of INSERT to actually mean INSERT and not
> UPSERT, or do you still have to specify IF NOT EXISTS?
>
> Jon
>
>
>
> On Wed, Apr 16, 2025 at 12:50 AM Štefan Miklošovič <[email protected]>
> wrote:
>
>> Maybe I am too naive and idealist but I can see a world where, even we
>> enable them to do it like "val int not null", after they describe a table
>> and they see "val int check not null" they start to use the latter form
>> because they would be tired of switching two forms in their head all the
>> time when they see that all other checks they want to specify have to be
>> specified in "val int check ..." form. They might actually start to
>> appreciate the unified fashion of it and they will themselves drift away
>> from "val int not null" eventually. I would :D.
>>
>> As we are among leaders in NoSQL databases we have also a great leverage
>> over how stuff is going to be used and we can actively form this space. We
>> do not need to just follow.
>>
>> I am open to supporting specifying "val int not null" _together with_
>> "val int check not null" form. Not _exclusively_ SQL syntax only. This
>> syntax sugar can go in anytime and it is more or less purely "additive".
>> Even if a user specifies it like "val int not null" it will be internally
>> stored as "val int check not null" to be consistent with how it is going to
>> be represented internally with all other checks. NOT NULL will ultimately
>> be a check as any other we have, it is just how it might be defined on the
>> CQL level which we are arguing about here.
>>
>> On Wed, Apr 16, 2025 at 12:56 AM Patrick McFadin <[email protected]>
>> wrote:
>>
>>> I may be to blame for some of that energy. :D
>>>
>>> No, we don't have consensus on that direction, but I think we will
>>> eventually. CQL started out as a subset of SQL but has drifted because
>>> features get added async and there just hasn't been any formal standard or
>>> road map or lofty goal. We've been laying out track in front of us as we go
>>> along. A couple of years ago, I had a conversation at ApacheCon, and it
>>> started slowly rattling around my brain that the developer experience
>>> destination is regression to a mean and not charting a bold new path.
>>> Everything else in Cassandra was already charting plenty of new paths, but
>>> if we give users a calm, familiar API, that feels like a winner. SQL is
>>> already a standard and could be our roadmap.
>>>
>>> With all that, I've been getting back into the data modeling game and
>>> seeing how a slightly different syntax can lead to angry users. In
>>> conversations with users, the idea of being SQL compatible in syntax
>>> (features can come whenever) is highly appealing. In reality, we are
>>> talking about CQL4, but why create tech debt for later if we design
>>> features now? My dream would be to have a compatibility test for Cassandra
>>> SQL that ticks off what's left for implementation. This is probably a
>>> DISCUSS thread but here we are now.
>>>
>>> Patrick
>>>
>>> On Tue, Apr 15, 2025 at 1:47 PM Josh McKenzie <[email protected]>
>>> wrote:
>>>
>>>> If we have a goal of eventually providing ANSI SQL support one day, we
>>>> should at least stick to the ANSI SQL standard where applicable for
>>>> features in the meantime.
>>>>
>>>> Do we collectively have that goal? Not disagreeing with it at all,
>>>> genuinely curious.
>>>>
>>>> Broadly I agree that we should definitely keep ourselves *compatible *with
>>>> ANSI SQL where possible so as not to paint ourselves into a corner nor
>>>> alienate our users, but I do believe there's value in having a consistent
>>>> UX while acknowledging and accommodating that things aren't necessarily
>>>> consistent in the SQL space.
>>>>
>>>> Are you against offering a superset of that standard (i.e. CHECK on all
>>>> constraints a supported form, ANSI SQL standard on things like IS NOT NULL
>>>> etc)?
>>>>
>>>> On Tue, Apr 15, 2025, at 3:05 AM, Benedict wrote:
>>>>
>>>>
>>>> I am not certain if the ANSI SQL standard requires that any expression
>>>> be supported, but either way it is much better to implement a strict subset
>>>> of a feature than to implement an overlapping feature we can never
>>>> synchronise with the standard.
>>>>
>>>> Accord is also capable of imposing multi column expressions without
>>>> constraining what user queries can do.
>>>>
>>>>
>>>>
>>>> On 15 Apr 2025, at 02:10, David Capwell <[email protected]> wrote:
>>>>
>>>> One argument I have against using columns in the constraint is that we
>>>> currently are not able to actually handle the CHECK spec in SQL.  In SQL
>>>> the CHECK expression is a valid search expression (eg. price > 42 AND tag =
>>>> ‘foo’), and is limited to the current row.
>>>>
>>>> Where we differ in Apache Cassandra
>>>>
>>>> 1) the mutation might not include the columns referenced in the search
>>>> expression: price numeric CHECK (name IS NOT NULL) (requires each mutation
>>>> to define all columns referenced in any constraint)
>>>> 2) our current checks are not normal search expressions (that limit
>>>> functions to deterministic ones), its a allowed listed set of expressions
>>>> (fixable)
>>>>
>>>> If we actually want to be the SQL version of constraints then we need
>>>> to put more restrictions on what people can do.
>>>>
>>>> For example
>>>>
>>>> price numeric CHECK (length(name) > 42)
>>>>
>>>> This constraint can only be handled if both price and name are present
>>>> in the query, so we would need to force all mutations to have price and
>>>> name, as we wouldn’t know the value of name in the following query
>>>>
>>>> INSERT INTO tbl (pk, ck, price) VALUES (0, 0, 0);
>>>>
>>>> So, do we put such a limitation on users and force them to define
>>>> every column present in a constraint, or do we break the SQL spec?  If
>>>> we choose to break the spec, then why allow column names in the search
>>>> expressions?  Only “this” column is safe
>>>>
>>>> On Apr 14, 2025, at 1:24 PM, Benedict <[email protected]> wrote:
>>>>
>>>>
>>>> If we have a goal of eventually providing ANSI SQL support one day, we
>>>> should at least stick to the ANSI SQL standard where applicable for
>>>> features in the meantime. AFAICT the reason everyone else does this the
>>>> same is it is part of the standard. So, I am more than happy to stick to
>>>> the CHECK qualifier for all of our unique extensions, but for NOT NULL we
>>>> should absolutely follow the SQL standard, and for the same reason we
>>>> should use the field name in the CHECK expression.
>>>>
>>>>
>>>> On 14 Apr 2025, at 21:10, Josh McKenzie <[email protected]> wrote:
>>>>
>>>> 
>>>> Consistency *within* our own ecosystem supersedes consistency with
>>>> other familiar ecosystems IMO.
>>>>
>>>> I'd prefer we consistently apply the CHECK keyword and don't have
>>>> special cases that omit it, or perhaps have those as optional syntactic
>>>> sugar but at its base the syntax is uniform and consistent.
>>>>
>>>> On Mon, Apr 14, 2025, at 3:31 PM, Štefan Miklošovič wrote:
>>>>
>>>> To be honest, I agree with Bernardo here.
>>>>
>>>> The requirement to have "val int not null" makes sense at first, but it
>>>> is not so straightforward.
>>>>
>>>> I think that what we are trying to do here is to "copy like they have
>>>> it" because ... well, because they have it like that. And we do not want to
>>>> differ too much, because we like adoption and if it is easier for newcomers
>>>> who are used to some style already, better to support it as well in this
>>>> case because, why not, right?
>>>>
>>>> Yeah, right ... but we are in a different position than SQL
>>>> implementation. I already said it, but I think that what happened first was
>>>> that they started to support "val int not null" and then they had a need to
>>>> support checks as well, so they slapped "val int not null check ..." on
>>>> that. They could not start from scratch, but we can. So not supporting "val
>>>> int not null" in favor of "val int check not null" makes more sense to me.
>>>> So what if it is different? It is at least consistent.
>>>>
>>>> This is a broader discussion to have though. AFAIK, we do not have any
>>>> official policy which would command us to follow SQL as closely as
>>>> possible. We have never agreed on following it like SQL has it officially.
>>>> So "doing it like they have it" as it most a "nice to have" but not a
>>>> strict requirement.
>>>>
>>>> On the other hand, if we ever codified that "we are following SQL" then
>>>> everything would need to follow it to the t. Not only constraints, every
>>>> other new feature which would change CQL would need to behave like that.
>>>>
>>>> We do not have this policy anywhere afaik so when it comes to that I do
>>>> not think that supporting "val int not null" is necessary. I think that
>>>> their mixture of two styles is actually wrong and misleading. Having one
>>>> way to specify it with "val int check ..." is just a superior solution here
>>>> imho.
>>>>
>>>> On Mon, Apr 14, 2025 at 9:14 PM Bernardo Botella <
>>>> [email protected]> wrote:
>>>>
>>>> Now this is becoming a really interesting discussion. Thanks everyone
>>>> for pitching in!
>>>>
>>>> Here is my take on some of the proposed changes:
>>>>
>>>> We are talking about treating some constraints (NOT_NULL, JSON) as
>>>> special cases by omitting the CHECK keyword (not reserved as per current
>>>> implementation). Now, while this may seem like a nice approach to feature
>>>> gaps on our CQL, it really worries me that by doing so we open the door to
>>>> not needed complexity both at implementation and conceptually with the
>>>> constraints framework.
>>>>
>>>> In my mind, what’s the constraints framework? It is a simple and really
>>>> easy to extend integration point for validators for a row. (LENGHT, SCALAR,
>>>> REGEX, are really good examples of it).
>>>>
>>>> What’s NOT the responsibility of the constraints framework? I don’t
>>>> think this should be used to deliver partial solutions to feature gaps on
>>>> CQL data modeling. Let’s take JSON constraint as an example. In the
>>>> constraints case, it is as simple as checking that the provided string is
>>>> valid json. Easy. Simple. But, how would JSON look like if it was a first
>>>> class citizen in CQL? Setting the grammar aside, it would be handled
>>>> differently probably. Things like: Can we store it better? Do we allow
>>>> making queries for fields inside the json blob? Are there any optimizations
>>>> that can be done when serializing/deserializing it? All of those definitely
>>>> fall out of the scope of the constraints framework. So, I guess the
>>>> question then becomes, is the JSON constraint a valid constraint to have?
>>>> Just a temporal patch until (if) JSON type is in? Should we just remove it
>>>> and keep ignoring JSON? Those are valid questions and discussions to have.
>>>> But, I really think that we shouldn’t see this simple validator as a full
>>>> fledged, first class citizen, type in CQL. Similar arguments could be have
>>>> for the NOT_NULL constraint that has spawned so many interesting
>>>> conversations.
>>>>
>>>> Now, having made that distinction, I don’t think we should have
>>>> constraints that can be defined differently on the CQL statement. They
>>>> should all have a CHECK keyword, specifying that they are a constraint that
>>>> will be checked (aka, row value will be validated against whatever
>>>> function). That’s easy to identify, and it’s conceptually easy to
>>>> understand the limitations it comes with (as opposed to the JSON example
>>>> mentioned above).
>>>>
>>>> Bernardo
>>>>
>>>>
>>>>
>>>> On Apr 14, 2025, at 10:53 AM, Štefan Miklošovič <[email protected]>
>>>> wrote:
>>>>
>>>> As Yifan already said, "check" is not a reserved word now and its usage
>>>> does not collide with anything.
>>>>
>>>> If people have columns, tables, keyspaces with name "check" that seems
>>>> to work already so they don't need to do anything:
>>>>
>>>> CREATE TABLE ks.tb (id int check id > 0, val int check val > 0, primary
>>>> key (id));
>>>>
>>>> ALTER TABLE ks.tb ADD check int check check > 0;
>>>>
>>>> DESCRIBE ks.tb;
>>>>
>>>> CREATE TABLE ks.tb (
>>>>     id int CHECK id > 0 PRIMARY KEY,
>>>>     check int CHECK check > 0,
>>>>     val int CHECK val > 0
>>>> ) ....
>>>>
>>>> CREATE TABLE ks.check (id int check id > 0, check int check check > 0,
>>>> primary key (id));
>>>> CREATE KEYSPACE check WITH replication = {'class': 'SimpleStrategy',
>>>> 'replication_factor': 1};
>>>> CREATE TABLE check.check (check int check check > 0, val int check val
>>>> > 0, primary key (check));
>>>> INSERT INTO check.check (check , val ) VALUES ( 1, 1);
>>>>
>>>> PostgreSQL has this:
>>>>
>>>> CREATE TABLE products (
>>>>     product_no integer,
>>>>     name text,
>>>>     price numeric CHECK (price > 0)
>>>> );
>>>>
>>>> we follow this approach (minus parenthesis). We can also chain
>>>> constraints whatever we like
>>>>
>>>> val int CHECK val > 0 and age < 100
>>>>
>>>> We can make a stab in trying to model
>>>>
>>>> val int not null check val > 0
>>>>
>>>> this is how PostgreSQL has it (1).
>>>>
>>>> but that would be more complicated on the implementation side because
>>>> we would need to also accommodate "CQL describe" to dump it like that, plus
>>>> I am not sure how complicated it would be to tweak the parser as well.
>>>>
>>>> I will try to make some progress and will report back.
>>>>
>>>> Regards
>>>>
>>>> (1)
>>>> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-NOT-NULL
>>>>
>>>> On Sun, Apr 13, 2025 at 6:49 PM Dinesh Joshi <[email protected]> wrote:
>>>>
>>>> On Sun, Apr 13, 2025 at 9:24 AM Patrick McFadin <[email protected]>
>>>> wrote:
>>>>
>>>> I'm loving all the syntax discussion lately. It's a good debate and
>>>> essential for the project's future with a good developer experience.
>>>>
>>>>
>>>> +1
>>>>
>>>>
>>>> On NULL. I've been asked this a million times by end users. Why is
>>>> there no "NOT NULL" in the schema?
>>>>
>>>>
>>>> I would've expected this to be in billions by now ;)
>>>>
>>>>
>>>> I'm in favor of the standard SQL syntax here because it's what users
>>>> have been using forever:
>>>> name       text NOT NULL
>>>>
>>>>
>>>> I hold a weak opinion on this. We don't have to necessarily align on to
>>>> the standard SQL syntax. In my experience, users subconsciously feel
>>>> Cassandra is a SQL database and try to design their schema to fit the
>>>> traditional SQL / RDBMS design and then later are disappointed to find out
>>>> it doesn't have joins or foreign key constraints. But that's just my
>>>> personal experience working with users. However, I think we should strive
>>>> for consistency and if it aligns with SQL I have no issues with the syntax.
>>>> Just sharing my experience.
>>>>
>>>>
>>>> On CHECK. Also in favor of making this a reserved word but in context.
>>>> Namely, how Postgres SQL works. CHECK ( boolean_expression_on_column)
>>>>
>>>>
>>>> Making CHECK a reserved keyword may result in issues that Scott
>>>> described. It will present a substantial barrier for users to upgrade as
>>>> applications will have to be updated.
>>>>
>>>> Thanks,
>>>>
>>>> Dinesh
>>>>
>>>>
>>>>
>>>>

Re: Constraint's "not null" alignment with transactions and their simplification

Reply via email to