_A bigger issue, is the actual behavior, in that you can say NOT NULL and still easily get null values by not supplying the value. The constraint is more of a weakly applied suggestion._
This is not true. You can not really put there null. See other discussion about this behaviour on ML. On Wed, Apr 16, 2025 at 6:36 PM Jon Haddad <j...@rustyrazorblade.com> wrote: > I think the world is fairly used to NOT NULL in line with the type. I > don't see much value in paving our own path for things that are established > norms. Let's not reinvent wheels that don't need reinventing. > > I also think it's fine if the generated CQL from describe doesn't match > the user's input. We already do this with all the table level fields that > get added. I'm not sure why we should care if a user drifts away from not > null either. I think these are pretty irrelevant details in the grand > scheme of things. > > A bigger issue, is the actual behavior, in that you can say NOT NULL and > still easily get null values by not supplying the value. The constraint is > more of a weakly applied suggestion. I'm expecting a bit of confusion > around this. I wonder if it makes sense to not allow prepared INSERT > statements that don't contain the field if it's null constrained, although > it doesn't really help us in the case of an upsert. It does narrow the > edge case a bit, which could be argued is good or bad depending on > someone's mood. > > I think it should be made clear that what we're really giving the user > when we say `value check not null` is this: > > INSERT INTO blah (id, value) VALUES(?, ? not null); > > Does accord change the behavior of INSERT to actually mean INSERT and not > UPSERT, or do you still have to specify IF NOT EXISTS? > > Jon > > > > On Wed, Apr 16, 2025 at 12:50 AM Štefan Miklošovič <smikloso...@apache.org> > wrote: > >> Maybe I am too naive and idealist but I can see a world where, even we >> enable them to do it like "val int not null", after they describe a table >> and they see "val int check not null" they start to use the latter form >> because they would be tired of switching two forms in their head all the >> time when they see that all other checks they want to specify have to be >> specified in "val int check ..." form. They might actually start to >> appreciate the unified fashion of it and they will themselves drift away >> from "val int not null" eventually. I would :D. >> >> As we are among leaders in NoSQL databases we have also a great leverage >> over how stuff is going to be used and we can actively form this space. We >> do not need to just follow. >> >> I am open to supporting specifying "val int not null" _together with_ >> "val int check not null" form. Not _exclusively_ SQL syntax only. This >> syntax sugar can go in anytime and it is more or less purely "additive". >> Even if a user specifies it like "val int not null" it will be internally >> stored as "val int check not null" to be consistent with how it is going to >> be represented internally with all other checks. NOT NULL will ultimately >> be a check as any other we have, it is just how it might be defined on the >> CQL level which we are arguing about here. >> >> On Wed, Apr 16, 2025 at 12:56 AM Patrick McFadin <pmcfa...@gmail.com> >> wrote: >> >>> I may be to blame for some of that energy. :D >>> >>> No, we don't have consensus on that direction, but I think we will >>> eventually. CQL started out as a subset of SQL but has drifted because >>> features get added async and there just hasn't been any formal standard or >>> road map or lofty goal. We've been laying out track in front of us as we go >>> along. A couple of years ago, I had a conversation at ApacheCon, and it >>> started slowly rattling around my brain that the developer experience >>> destination is regression to a mean and not charting a bold new path. >>> Everything else in Cassandra was already charting plenty of new paths, but >>> if we give users a calm, familiar API, that feels like a winner. SQL is >>> already a standard and could be our roadmap. >>> >>> With all that, I've been getting back into the data modeling game and >>> seeing how a slightly different syntax can lead to angry users. In >>> conversations with users, the idea of being SQL compatible in syntax >>> (features can come whenever) is highly appealing. In reality, we are >>> talking about CQL4, but why create tech debt for later if we design >>> features now? My dream would be to have a compatibility test for Cassandra >>> SQL that ticks off what's left for implementation. This is probably a >>> DISCUSS thread but here we are now. >>> >>> Patrick >>> >>> On Tue, Apr 15, 2025 at 1:47 PM Josh McKenzie <jmcken...@apache.org> >>> wrote: >>> >>>> If we have a goal of eventually providing ANSI SQL support one day, we >>>> should at least stick to the ANSI SQL standard where applicable for >>>> features in the meantime. >>>> >>>> Do we collectively have that goal? Not disagreeing with it at all, >>>> genuinely curious. >>>> >>>> Broadly I agree that we should definitely keep ourselves *compatible *with >>>> ANSI SQL where possible so as not to paint ourselves into a corner nor >>>> alienate our users, but I do believe there's value in having a consistent >>>> UX while acknowledging and accommodating that things aren't necessarily >>>> consistent in the SQL space. >>>> >>>> Are you against offering a superset of that standard (i.e. CHECK on all >>>> constraints a supported form, ANSI SQL standard on things like IS NOT NULL >>>> etc)? >>>> >>>> On Tue, Apr 15, 2025, at 3:05 AM, Benedict wrote: >>>> >>>> >>>> I am not certain if the ANSI SQL standard requires that any expression >>>> be supported, but either way it is much better to implement a strict subset >>>> of a feature than to implement an overlapping feature we can never >>>> synchronise with the standard. >>>> >>>> Accord is also capable of imposing multi column expressions without >>>> constraining what user queries can do. >>>> >>>> >>>> >>>> On 15 Apr 2025, at 02:10, David Capwell <dcapw...@apple.com> wrote: >>>> >>>> One argument I have against using columns in the constraint is that we >>>> currently are not able to actually handle the CHECK spec in SQL. In SQL >>>> the CHECK expression is a valid search expression (eg. price > 42 AND tag = >>>> ‘foo’), and is limited to the current row. >>>> >>>> Where we differ in Apache Cassandra >>>> >>>> 1) the mutation might not include the columns referenced in the search >>>> expression: price numeric CHECK (name IS NOT NULL) (requires each mutation >>>> to define all columns referenced in any constraint) >>>> 2) our current checks are not normal search expressions (that limit >>>> functions to deterministic ones), its a allowed listed set of expressions >>>> (fixable) >>>> >>>> If we actually want to be the SQL version of constraints then we need >>>> to put more restrictions on what people can do. >>>> >>>> For example >>>> >>>> price numeric CHECK (length(name) > 42) >>>> >>>> This constraint can only be handled if both price and name are present >>>> in the query, so we would need to force all mutations to have price and >>>> name, as we wouldn’t know the value of name in the following query >>>> >>>> INSERT INTO tbl (pk, ck, price) VALUES (0, 0, 0); >>>> >>>> So, do we put such a limitation on users and force them to define >>>> every column present in a constraint, or do we break the SQL spec? If >>>> we choose to break the spec, then why allow column names in the search >>>> expressions? Only “this” column is safe >>>> >>>> On Apr 14, 2025, at 1:24 PM, Benedict <bened...@apache.org> wrote: >>>> >>>> >>>> If we have a goal of eventually providing ANSI SQL support one day, we >>>> should at least stick to the ANSI SQL standard where applicable for >>>> features in the meantime. AFAICT the reason everyone else does this the >>>> same is it is part of the standard. So, I am more than happy to stick to >>>> the CHECK qualifier for all of our unique extensions, but for NOT NULL we >>>> should absolutely follow the SQL standard, and for the same reason we >>>> should use the field name in the CHECK expression. >>>> >>>> >>>> On 14 Apr 2025, at 21:10, Josh McKenzie <jmcken...@apache.org> wrote: >>>> >>>> >>>> Consistency *within* our own ecosystem supersedes consistency with >>>> other familiar ecosystems IMO. >>>> >>>> I'd prefer we consistently apply the CHECK keyword and don't have >>>> special cases that omit it, or perhaps have those as optional syntactic >>>> sugar but at its base the syntax is uniform and consistent. >>>> >>>> On Mon, Apr 14, 2025, at 3:31 PM, Štefan Miklošovič wrote: >>>> >>>> To be honest, I agree with Bernardo here. >>>> >>>> The requirement to have "val int not null" makes sense at first, but it >>>> is not so straightforward. >>>> >>>> I think that what we are trying to do here is to "copy like they have >>>> it" because ... well, because they have it like that. And we do not want to >>>> differ too much, because we like adoption and if it is easier for newcomers >>>> who are used to some style already, better to support it as well in this >>>> case because, why not, right? >>>> >>>> Yeah, right ... but we are in a different position than SQL >>>> implementation. I already said it, but I think that what happened first was >>>> that they started to support "val int not null" and then they had a need to >>>> support checks as well, so they slapped "val int not null check ..." on >>>> that. They could not start from scratch, but we can. So not supporting "val >>>> int not null" in favor of "val int check not null" makes more sense to me. >>>> So what if it is different? It is at least consistent. >>>> >>>> This is a broader discussion to have though. AFAIK, we do not have any >>>> official policy which would command us to follow SQL as closely as >>>> possible. We have never agreed on following it like SQL has it officially. >>>> So "doing it like they have it" as it most a "nice to have" but not a >>>> strict requirement. >>>> >>>> On the other hand, if we ever codified that "we are following SQL" then >>>> everything would need to follow it to the t. Not only constraints, every >>>> other new feature which would change CQL would need to behave like that. >>>> >>>> We do not have this policy anywhere afaik so when it comes to that I do >>>> not think that supporting "val int not null" is necessary. I think that >>>> their mixture of two styles is actually wrong and misleading. Having one >>>> way to specify it with "val int check ..." is just a superior solution here >>>> imho. >>>> >>>> On Mon, Apr 14, 2025 at 9:14 PM Bernardo Botella < >>>> conta...@bernardobotella.com> wrote: >>>> >>>> Now this is becoming a really interesting discussion. Thanks everyone >>>> for pitching in! >>>> >>>> Here is my take on some of the proposed changes: >>>> >>>> We are talking about treating some constraints (NOT_NULL, JSON) as >>>> special cases by omitting the CHECK keyword (not reserved as per current >>>> implementation). Now, while this may seem like a nice approach to feature >>>> gaps on our CQL, it really worries me that by doing so we open the door to >>>> not needed complexity both at implementation and conceptually with the >>>> constraints framework. >>>> >>>> In my mind, what’s the constraints framework? It is a simple and really >>>> easy to extend integration point for validators for a row. (LENGHT, SCALAR, >>>> REGEX, are really good examples of it). >>>> >>>> What’s NOT the responsibility of the constraints framework? I don’t >>>> think this should be used to deliver partial solutions to feature gaps on >>>> CQL data modeling. Let’s take JSON constraint as an example. In the >>>> constraints case, it is as simple as checking that the provided string is >>>> valid json. Easy. Simple. But, how would JSON look like if it was a first >>>> class citizen in CQL? Setting the grammar aside, it would be handled >>>> differently probably. Things like: Can we store it better? Do we allow >>>> making queries for fields inside the json blob? Are there any optimizations >>>> that can be done when serializing/deserializing it? All of those definitely >>>> fall out of the scope of the constraints framework. So, I guess the >>>> question then becomes, is the JSON constraint a valid constraint to have? >>>> Just a temporal patch until (if) JSON type is in? Should we just remove it >>>> and keep ignoring JSON? Those are valid questions and discussions to have. >>>> But, I really think that we shouldn’t see this simple validator as a full >>>> fledged, first class citizen, type in CQL. Similar arguments could be have >>>> for the NOT_NULL constraint that has spawned so many interesting >>>> conversations. >>>> >>>> Now, having made that distinction, I don’t think we should have >>>> constraints that can be defined differently on the CQL statement. They >>>> should all have a CHECK keyword, specifying that they are a constraint that >>>> will be checked (aka, row value will be validated against whatever >>>> function). That’s easy to identify, and it’s conceptually easy to >>>> understand the limitations it comes with (as opposed to the JSON example >>>> mentioned above). >>>> >>>> Bernardo >>>> >>>> >>>> >>>> On Apr 14, 2025, at 10:53 AM, Štefan Miklošovič <smikloso...@apache.org> >>>> wrote: >>>> >>>> As Yifan already said, "check" is not a reserved word now and its usage >>>> does not collide with anything. >>>> >>>> If people have columns, tables, keyspaces with name "check" that seems >>>> to work already so they don't need to do anything: >>>> >>>> CREATE TABLE ks.tb (id int check id > 0, val int check val > 0, primary >>>> key (id)); >>>> >>>> ALTER TABLE ks.tb ADD check int check check > 0; >>>> >>>> DESCRIBE ks.tb; >>>> >>>> CREATE TABLE ks.tb ( >>>> id int CHECK id > 0 PRIMARY KEY, >>>> check int CHECK check > 0, >>>> val int CHECK val > 0 >>>> ) .... >>>> >>>> CREATE TABLE ks.check (id int check id > 0, check int check check > 0, >>>> primary key (id)); >>>> CREATE KEYSPACE check WITH replication = {'class': 'SimpleStrategy', >>>> 'replication_factor': 1}; >>>> CREATE TABLE check.check (check int check check > 0, val int check val >>>> > 0, primary key (check)); >>>> INSERT INTO check.check (check , val ) VALUES ( 1, 1); >>>> >>>> PostgreSQL has this: >>>> >>>> CREATE TABLE products ( >>>> product_no integer, >>>> name text, >>>> price numeric CHECK (price > 0) >>>> ); >>>> >>>> we follow this approach (minus parenthesis). We can also chain >>>> constraints whatever we like >>>> >>>> val int CHECK val > 0 and age < 100 >>>> >>>> We can make a stab in trying to model >>>> >>>> val int not null check val > 0 >>>> >>>> this is how PostgreSQL has it (1). >>>> >>>> but that would be more complicated on the implementation side because >>>> we would need to also accommodate "CQL describe" to dump it like that, plus >>>> I am not sure how complicated it would be to tweak the parser as well. >>>> >>>> I will try to make some progress and will report back. >>>> >>>> Regards >>>> >>>> (1) >>>> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-NOT-NULL >>>> >>>> On Sun, Apr 13, 2025 at 6:49 PM Dinesh Joshi <djo...@apache.org> wrote: >>>> >>>> On Sun, Apr 13, 2025 at 9:24 AM Patrick McFadin <pmcfa...@gmail.com> >>>> wrote: >>>> >>>> I'm loving all the syntax discussion lately. It's a good debate and >>>> essential for the project's future with a good developer experience. >>>> >>>> >>>> +1 >>>> >>>> >>>> On NULL. I've been asked this a million times by end users. Why is >>>> there no "NOT NULL" in the schema? >>>> >>>> >>>> I would've expected this to be in billions by now ;) >>>> >>>> >>>> I'm in favor of the standard SQL syntax here because it's what users >>>> have been using forever: >>>> name text NOT NULL >>>> >>>> >>>> I hold a weak opinion on this. We don't have to necessarily align on to >>>> the standard SQL syntax. In my experience, users subconsciously feel >>>> Cassandra is a SQL database and try to design their schema to fit the >>>> traditional SQL / RDBMS design and then later are disappointed to find out >>>> it doesn't have joins or foreign key constraints. But that's just my >>>> personal experience working with users. However, I think we should strive >>>> for consistency and if it aligns with SQL I have no issues with the syntax. >>>> Just sharing my experience. >>>> >>>> >>>> On CHECK. Also in favor of making this a reserved word but in context. >>>> Namely, how Postgres SQL works. CHECK ( boolean_expression_on_column) >>>> >>>> >>>> Making CHECK a reserved keyword may result in issues that Scott >>>> described. It will present a substantial barrier for users to upgrade as >>>> applications will have to be updated. >>>> >>>> Thanks, >>>> >>>> Dinesh >>>> >>>> >>>> >>>>