Rereading this: I do think any implementation of NOT NULL that has a way to let NULL in is bad. So I would be -1 on the proposal here that lets through INSERTs that don’t specify the column
and also: " requiring that for INSERT, letting UPDATE be “user beware” - and you -1 it as well, that looks like our "strict not null" is satisfying this, yes. On Mon, Feb 10, 2025 at 5:00 PM Jeremiah Jordan <jeremiah.jor...@gmail.com> wrote: > Having thought about this in the past, some options that have come up in > those discussions were: > > 1. Constraints forcing users to always specify a value for a given > column or all columns. Only allow NOT NULL for columns with such a > constraint applied. > 2. Similar to the above but only requiring that for INSERT, letting > UPDATE be “user beware”. > 3. Forcing a read before write for all cases where it is not specified. > 1. You have to consider some problem cases here with optimizing > this. If you want to only do the check on the replica, you need to > correctly handle the case where the value only exists on some replicas > and > not others. > > > I do think any implementation of NOT NULL that has a way to let NULL in is > bad. So I would be -1 on the proposal here that lets through INSERTs that > don’t specify the column (also I would be -1 on the option 2 above, but I > included it as something I have discussed with others in the past). > > -Jeremiah > > On Feb 10, 2025 at 9:27:52 AM, Bernardo Botella < > conta...@bernardobotella.com> wrote: > >> I will create a Jira to keep track of that “NO VERIFY” suggestion. For >> this thread, I’d like to stick to the actual proposal for both NOT_NULL and >> STRICTLY_NOT_NULL constraints Stefan and I are adding on the patch. >> >> >> On Feb 10, 2025, at 7:18 AM, Benedict <bened...@apache.org> wrote: >> >> Thanks. While I agree we shouldn’t be applying these constraints post hoc >> on read or compaction, I think we need to make clear to the user whether we >> are validating a new constraint before accepting it for alter table. Which >> is to say I think alter table should require something like “NO VERIFY” or >> some other additional keywords to make clear we aren’t checking the >> constraint applies to existing data. >> >> >> On 10 Feb 2025, at 15:10, Bernardo Botella <conta...@bernardobotella.com> >> wrote: >> >> Hi. These was a topic we discussed during the ML thread: >> lists.apache.org >> <https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj> >> <favicon.ico> >> <https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj> >> <https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj> >> >> Here was one of my answers on that: >> lists.apache.org >> <https://lists.apache.org/thread/76olqf6225noygxcclsrs56ngnlmcvxv> >> <favicon.ico> >> <https://lists.apache.org/thread/76olqf6225noygxcclsrs56ngnlmcvxv> >> <https://lists.apache.org/thread/76olqf6225noygxcclsrs56ngnlmcvxv> >> >> It was also specified in the CEP ( >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework#CEP42:ConstraintsFramework-Constraintexecutionatwritetime >> ): >> "Note: This constraints are only enforced at write time. So, an ALTER >> CONSTRAINT with more restrictive constraints shouldn’t affect preexisting >> data.” >> >> Long story short, constraints are only checked at write time. If a >> constraint is added to a table with preexisting offending data, that data >> stays untouched. >> >> I hope this helps, >> Bernardo >> >> On Feb 10, 2025, at 7:00 AM, Benedict <bened...@apache.org> wrote: >> >> This is counterintuitive to me. The constraint should be applied to the >> table, not to the update. NOT NULL should imply a value is always specified. >> >> How are you handling this for tables that already exist? Can we alter >> table to add constraints, and if so what are the semantics? >> >> On 10 Feb 2025, at 14:50, Bernardo Botella <conta...@bernardobotella.com> >> wrote: >> >> Hi everyone, >> >> Stefan Miklosovic and I have been working on a NOT_NULL ( >> https://github.com/apache/cassandra/pull/3867) constraint to be added to >> the constraints tool belt, and a really interesting conversation came up. >> >> First, as a problem statement, let's consider this: >> >> ----------------------------------------- >> CREATE TABLE ks.tb2 ( >> id int, >> cl1 int, >> cl2 int, >> val text CHECK NOT_NULL(val), >> PRIMARY KEY (id, cl1, cl2) >> ) >> >> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, >> 3, null); >> InvalidRequest: Error from server: code=2200 [Invalid query] >> message="Column value does not satisfy value constraint for column 'val' as >> it is null." >> >> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, >> 3, “text"); >> cassandra@cqlsh> select * from ks.tb2; >> >> id | cl1 | cl2 | val >> ----+-----+-----+------ >> 1 | 2 | 3 | text >> >> (1 rows) >> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4); >> cassandra@cqlsh> select * from ks.tb2; >> >> id | cl1 | cl2 | val >> ----+-----+-----+------ >> 1 | 2 | 3 | text >> 1 | 2 | 4 | null >> >> ----------------------------------------- >> >> As you see, we have a hole in which a 'null' value is getting written on >> column val even if we have a NOT_NULL on that particular column whenever >> the column is NOT specified on the write. That raises the question on how >> this particular constraint should behave. >> >> If we consider the other constraints (scalar constraint and length >> constraint so far), this particular behavior is fine. But, if the >> constraint is NOT_NULL, then it becomes a little bit trickier. >> >> The conclusions we have reached is that the meaning of constraints should >> be interpreted like: I check whatever you give me as part of the write, >> ignoring everything else. Let me elaborate: >> If we decide to treat this particular NOT_NULL constraint differently, >> and check if the value for that column is present in the insert statement, >> we then open a different can of worms. What happens if the row already >> exists with a valid value, and that insert statement is only trying to do >> an update to a different column in the row? If that was the case, we would >> be forcing the user to specify the 'val' column value for every update, >> even if it is not needed. >> >> Mainly for this reason, we think it is better to treat this NOT_NULL >> constraint just like the other constraints, and execute it ONLY on the >> values that are present on the insert statement. >> >> The main con is that it may lead to a little bit of confussion (as in, >> why I just added a null value to the table even if I have a NOT_NULL >> constraint?). We have thought on aliviating this particular confusion by: >> - Extensive documentation. Let's be upfront on what this constraint does >> and does not. ( >> https://github.com/apache/cassandra/blob/ed58c404e8c880b69584e71a3690d3d9f73ef9fa/doc/modules/cassandra/pages/developing/cql/constraints.adoc#not_null-constraint >> ) >> - Adding, as part of this patch, yet another constraint >> (STRICTLY_NOT_NULL), that checks for the actual column value to be present >> in the insert statement.. >> >> If you've made it until here, that means you are really interested in >> constraints. Thanks! The question for you is, would you have any concern >> with this approach? >> >> Thanks, >> Bernardo >> >> >> >>