Re: Constraint's "not null" alignment with transactions and their simplification

Štefan Miklošovič Fri, 11 Apr 2025 09:36:57 -0700

We can think about this. I don't mind supporting both. It needs a little
bit more love in parser I guess. Most probably doable, as syntactic sugar.


On Fri, Apr 11, 2025 at 6:25 PM Jon Haddad <[email protected]> wrote:

> I agree NOT NULL is nicer than check not null.
>
> Taking the JSON bit a step further, a dedicated JSON type is better than a
> check, but that's probably out of scope.
>
> For inspiration, I've worked with Postgre's JSONB type in the past, and it
> was awesome.
>
> https://www.postgresql.org/docs/current/functions-json.html
>
> On Fri, Apr 11, 2025 at 9:15 AM Benedict <[email protected]> wrote:
>
>> [image: elephant.png]
>>
>> 5.5. Constraints
>> <https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTS>
>> postgresql.org
>> <https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTS>
>>
>> <https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTS>
>>
>>
>> See the note directly above section 5.5.2 - Postgres supports functions
>> in the CHECK predicate and requires they are pure functions (ie always
>> yield the same result).
>>
>> The point is, CHECK is consistently a predicate defined over a field
>> expression. If we aren’t matching the feature, why are we matching the
>> keyword?
>>
>> I also disagree that CHECK is a useful prefix to all constraints, and
>> would prefer consistency with Postgres here that also uses plain NOT NULL
>> for a NOT NULL constraint.
>>
>>
>> On 11 Apr 2025, at 17:05, Štefan Miklošovič <[email protected]>
>> wrote:
>>
>> 
>> I went through Postgres' docs in a quite elaborate manner and I do not
>> see any usage of "constraint functions" as we have them. Both MySQL and
>> Postgres seem to only provide simple predicates (using relational
>> operators) and "functions" as we have them (e.g. json / length) are not
>> supported. So in this regard we are providing more than they do. Where have
>> you taken "is_json(field)" form from? Not saying it's wrong, I am just
>> curious where this is coming from.
>>
>> I do not like that we would have "NOT NULL" without "CHECK". I think we
>> can go a little bit our own way as we have the comfort of modelling this
>> from scratch. CQL is already different from SQL as is and I do not think
>> that trying to follow SQL _orthodoxly_ is absolutely necessary but at the
>> same time I find it easier and more welcoming for users coming to Cassandra
>> for the first time to have syntax which is as close as possible to what
>> they are used to.
>>
>> I find having constraints starting with "CHECK" _every time_ consistent.
>> They do not need to think twice if "check" is going to be there or not. It
>> is there every time. I do not know why SQL did not do it the same way, most
>> probably because "NOT NULL" was the first being introduced and "CHECK"
>> followed afterwards and it was just too late to it consistent.
>>
>>
>>
>> On Fri, Apr 11, 2025 at 5:33 PM Bernardo Botella <
>> [email protected]> wrote:
>>
>>> Benedict:
>>>
>>> An alternative for that, keeping the CHECK word, would be to change the
>>> constraint name to IS_JSON. CHECK IS_JSON would read as you intend without
>>> the need to jump to REQUIRE. I think that’s true for the rest of provided
>>> constraints as well.
>>>
>>> Bernardo
>>>
>>>
>>> On Apr 11, 2025, at 6:02 AM, Benedict <[email protected]> wrote:
>>>
>>> We have taken a different approach though, as we do not actually take a
>>> predicate on the RHS and do not supply the column name. In our examples we
>>> had eg CHECK JSON, which doesn’t parse unambiguously to a human. The
>>> equivalent to Postgres would seem to be CHECK is_json(field).
>>>
>>> I’m all for following an existing example, but once we decide to diverge
>>> the justification is gone and we should decide holistically what we think
>>> is best. So if we want to elide the column entirely and have a list of
>>> built in restrictions, I’d prefer eg REQUIRE JSON since this parses
>>> unambiguously to a human, whereas if we want to follow Postgres let’s do
>>> that but do it but that means eg CHECK is_json(field).
>>>
>>> On 11 Apr 2025, at 10:57, Štefan Miklošovič <[email protected]>
>>> wrote:
>>>
>>> 
>>> While modelling that, we followed how it is done in SQL world,
>>> PostgreSQL as well as MySQL both use CHECK.
>>>
>>>
>>> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTS
>>>
>>> https://dev.mysql.com/doc/refman/8.4/en/create-table-check-constraints.html
>>>
>>> On Fri, Apr 11, 2025 at 10:43 AM Benedict <[email protected]> wrote:
>>>
>>>> I would prefer require/expect/is over check
>>>>
>>>> On 11 Apr 2025, at 08:05, Štefan Miklošovič <[email protected]>
>>>> wrote:
>>>>
>>>> 
>>>> Yes, you will have it like that :) Thank you for this idea. Great
>>>> example of cooperation over diverse domains.
>>>>
>>>> On Fri, Apr 11, 2025 at 12:29 AM David Capwell <[email protected]>
>>>> wrote:
>>>>
>>>>> I am biased but I do prefer
>>>>>
>>>>> val3 text CHECK NOT NULL AND JSON AND LENGTH() < 1024
>>>>>
>>>>> Here is a similar accord CQL
>>>>>
>>>>> BEGIN TRANSACTION
>>>>>   LET a = (…);
>>>>>   IF a IS NOT NULL
>>>>>       AND a.b IS NOT NULL
>>>>>       AND a.c IS NULL; THEN
>>>>>     — profit
>>>>>   END IF
>>>>> COMMIT TRANSACTION
>>>>>
>>>>> On Apr 10, 2025, at 8:46 AM, Yifan Cai <[email protected]> wrote:
>>>>>
>>>>> Re: reserved keywords, “check” is currently not, and I don’t think it
>>>>> needs to be a reserved keyword with the proposal.
>>>>>
>>>>> ------------------------------
>>>>> *From:* C. Scott Andreas <[email protected]>
>>>>> *Sent:* Thursday, April 10, 2025 7:59:35 AM
>>>>> *To:* [email protected] <[email protected]>
>>>>> *Cc:* [email protected] <[email protected]>
>>>>> *Subject:* Re: Constraint's "not null" alignment with transactions
>>>>> and their simplification
>>>>>
>>>>> If the proposal does not introduce “check” as a reserved keyword that
>>>>> would require quoting in existing DDL/DML, this concern doesn’t apply and
>>>>> the email below can be ignored. This might be the case if “CHECK NOT NULL”
>>>>> is the full token introduced rather than “CHECK” separately from
>>>>> constraints that are checked.
>>>>>
>>>>> If “check” is introduced as a standalone reserved keyword: my primary
>>>>> feedback is on the introduction of reserved words in the CQL grammar that
>>>>> may affect compatibility of existing schemas.
>>>>>
>>>>> In the Cassandra 3.x series, several new CQL reserved words were added
>>>>> (more than necessary) and subsequently backed out, because it required
>>>>> users to begin quoting schemas and introduced incompatibility between 3.x
>>>>> and 4.x for queries and DDL that “just worked” before.
>>>>>
>>>>> The word “check” is used in many domains (test/evaluation engineering,
>>>>> finance, business processes, etc) and is likely to be used in user 
>>>>> schemas.
>>>>> If the proposal introduces this as a reserved word that would require it 
>>>>> to
>>>>> be quoted if used in table or column names, this will create
>>>>> incompatibility for existing user queries on upgrade.
>>>>>
>>>>> Otherwise, ignore me. :)
>>>>>
>>>>> Thanks,
>>>>>
>>>>> – Scott
>>>>>
>>>>> –––
>>>>> Mobile
>>>>>
>>>>> On Apr 10, 2025, at 7:47 AM, Jon Haddad <[email protected]>
>>>>> wrote:
>>>>>
>>>>> 
>>>>> This looks like a really nice improvement to me.
>>>>>
>>>>>
>>>>> On Thu, Apr 10, 2025 at 7:27 AM Štefan Miklošovič <
>>>>> [email protected]> wrote:
>>>>>
>>>>> Recently, David Capwell was commenting on constraints in one of Slack
>>>>> threads (1) in dev channel and he suggested that the current form of "not
>>>>> null" constraint we have right now in place, e.g like this
>>>>>
>>>>> create table ks.tb (id int primary key, val int check not_null(val));
>>>>>
>>>>> could be instead of that form used like this:
>>>>>
>>>>> create table ks.tb (id int primary key, val int check not null);
>>>>>
>>>>> That is - without the name of a column in the constraint's argument.
>>>>> The reasoning behind that was that it is not only easier to read but there
>>>>> is also this concept in transactions (cep-15) where there is also "not
>>>>> null" used in some fashion and it would be nice if this was aligned so a
>>>>> user does not encounter two usages of "not null"-s which are written down
>>>>> differently, syntax-wise.
>>>>>
>>>>> Could the usage of "not null" in transactions be confirmed?
>>>>>
>>>>> This rather innocent suggestion brought an idea to us that constraints
>>>>> could be quite simplified when it comes to their syntax, consider this:
>>>>>
>>>>> val int check not_null(val)
>>>>> val text check json(val)
>>>>> val text check lenght(val) < 1000
>>>>>
>>>>> to be used like this:
>>>>>
>>>>> val int check not null
>>>>> val text check json
>>>>> val text check length() < 1000
>>>>>
>>>>> more involved checks like this:
>>>>>
>>>>> val text check not_null(val) and json(val) and length(val) < 1000
>>>>>
>>>>> might be just simplified to:
>>>>>
>>>>> val text check not null and json and length() < 1000
>>>>>
>>>>> It almost reads like plain English. Isn't this just easier for an eye?
>>>>>
>>>>> The reason we kept the column names in constraint definitions is that,
>>>>> frankly speaking, we just did not know any better at the time it was about
>>>>> to be implemented. It is a little bit more tricky to be able to use it
>>>>> without column names because in Parser.g / Antlr we just bound the grammar
>>>>> around constraints to a column name directly there. When column names are
>>>>> not going to be there anymore, we need to bind it later in the code behind
>>>>> the parser in server code. It is doable, it was just about being a little
>>>>> bit more involved there.
>>>>>
>>>>> Also, one reason to keep the name of a column was that we might
>>>>> specify different columns in a constraint from a column that is defined on
>>>>> to have cross-column constraints but we abandoned this idea altogether for
>>>>> other reasons which rendered the occurrence of a column name in a
>>>>> constraint definition redundant.
>>>>>
>>>>> To have some overview of what would be possible to do with this
>>>>> proposal:
>>>>>
>>>>> val3 text CHECK SOMECONSTRAINT('a');
>>>>> val3 text CHECK JSON;
>>>>> val3 text CHECK SOMECONSTRAINT('a') > 1;
>>>>> val3 text CHECK SOMECONSTRAINT('a', 'b', 'c') > 1;
>>>>> val3 text CHECK JSON AND LENGTH() < 600;
>>>>> afternoon time CHECK afternoon >= '12:00:00' AND afternoon =<
>>>>> '23:59:59';
>>>>> val3 text CHECK NOT NULL AND JSON AND LENGTH() < 1024
>>>>>
>>>>> In addition to the specification of constraints without columns, what
>>>>> would be possible to do is to also specify arguments to constraints. It is
>>>>> currently not possible and there is no constraint which would accept
>>>>> arguments to its function but I think that to be as flexible as possible
>>>>> and prepare for the future, we might implement it as well.
>>>>>
>>>>> Constraints in their current form are already usable however I just
>>>>> think that if we do not simplify, align and extend the syntax right now,
>>>>> before it is baked in in a release, then we will never do it as it will be
>>>>> quite tricky to extend this without breaking it and maintaining two
>>>>> grammars at the same time would be very complex if not flat out 
>>>>> impossible.
>>>>>
>>>>> Are you open to the simplification of constraint definitions as
>>>>> suggested and what is your feedback about that? I already have a working
>>>>> POC which just needs to be polished and tests fixed to accommodate the new
>>>>> approach.
>>>>>
>>>>> Regards
>>>>>
>>>>> (1) https://the-asf.slack.com/archives/CK23JSY2K/p1742409054164389
>>>>>
>>>>>
>>>>>
>>>

Re: Constraint's "not null" alignment with transactions and their simplification

Reply via email to