Rereading this:

I do think any implementation of NOT NULL that has a way to let NULL in is
bad.  So I would be -1 on the proposal here that lets through INSERTs that
don’t specify the column

and also: " requiring that for INSERT, letting UPDATE be “user beware” -
and you -1 it as well, that looks like our "strict not null" is satisfying
this, yes.

On Mon, Feb 10, 2025 at 5:00 PM Jeremiah Jordan <jeremiah.jor...@gmail.com>
wrote:

> Having thought about this in the past, some options that have come up in
> those discussions  were:
>
>    1. Constraints forcing users to always specify a value for a given
>    column or all columns.  Only allow NOT NULL for columns with such a
>    constraint applied.
>    2. Similar to the above but only requiring that for INSERT, letting
>    UPDATE be “user beware”.
>    3. Forcing a read before write for all cases where it is not specified.
>       1. You have to consider some problem cases here with optimizing
>       this.  If you want to only do the check on the replica, you need to
>       correctly handle the case where the value only exists on some replicas 
> and
>       not others.
>
>
> I do think any implementation of NOT NULL that has a way to let NULL in is
> bad.  So I would be -1 on the proposal here that lets through INSERTs that
> don’t specify the column (also I would be -1 on the option 2 above, but I
> included it as something I have discussed with others in the past).
>
> -Jeremiah
>
> On Feb 10, 2025 at 9:27:52 AM, Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
>> I will create a Jira to keep track of that “NO VERIFY” suggestion. For
>> this thread, I’d like to stick to the actual proposal for both NOT_NULL and
>> STRICTLY_NOT_NULL constraints Stefan and I are adding on the patch.
>>
>>
>> On Feb 10, 2025, at 7:18 AM, Benedict <bened...@apache.org> wrote:
>>
>> Thanks. While I agree we shouldn’t be applying these constraints post hoc
>> on read or compaction, I think we need to make clear to the user whether we
>> are validating a new constraint before accepting it for alter table. Which
>> is to say I think alter table should require something like “NO VERIFY” or
>> some other additional keywords to make clear we aren’t checking the
>> constraint applies to existing data.
>>
>>
>> On 10 Feb 2025, at 15:10, Bernardo Botella <conta...@bernardobotella.com>
>> wrote:
>>
>> Hi. These was a topic we discussed during the ML thread:
>> lists.apache.org
>> <https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj>
>> <favicon.ico>
>> <https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj>
>> <https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj>
>>
>> Here was one of my answers on that:
>> lists.apache.org
>> <https://lists.apache.org/thread/76olqf6225noygxcclsrs56ngnlmcvxv>
>> <favicon.ico>
>> <https://lists.apache.org/thread/76olqf6225noygxcclsrs56ngnlmcvxv>
>> <https://lists.apache.org/thread/76olqf6225noygxcclsrs56ngnlmcvxv>
>>
>> It was also specified in the CEP (
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework#CEP42:ConstraintsFramework-Constraintexecutionatwritetime
>> ):
>> "Note: This constraints are only enforced at write time. So, an ALTER
>> CONSTRAINT with more restrictive constraints shouldn’t affect preexisting
>> data.”
>>
>> Long story short, constraints are only checked at write time. If a
>> constraint is added to a table with preexisting offending data, that data
>> stays untouched.
>>
>> I hope this helps,
>> Bernardo
>>
>> On Feb 10, 2025, at 7:00 AM, Benedict <bened...@apache.org> wrote:
>>
>> This is counterintuitive to me. The constraint should be applied to the
>> table, not to the update. NOT NULL should imply a value is always specified.
>>
>> How are you handling this for tables that already exist? Can we alter
>> table to add constraints, and if so what are the semantics?
>>
>> On 10 Feb 2025, at 14:50, Bernardo Botella <conta...@bernardobotella.com>
>> wrote:
>>
>> Hi everyone,
>>
>> Stefan Miklosovic and I have been working on a NOT_NULL (
>> https://github.com/apache/cassandra/pull/3867) constraint to be added to
>> the constraints tool belt, and a really interesting conversation came up.
>>
>> First, as a problem statement, let's consider this:
>>
>> -----------------------------------------
>> CREATE TABLE ks.tb2 (
>>   id int,
>>   cl1 int,
>>   cl2 int,
>>   val text CHECK NOT_NULL(val),
>>   PRIMARY KEY (id, cl1, cl2)
>> )
>>
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2,
>> 3, null);
>> InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="Column value does not satisfy value constraint for column 'val' as
>> it is null."
>>
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2,
>> 3, “text");
>> cassandra@cqlsh> select * from ks.tb2;
>>
>> id | cl1 | cl2 | val
>> ----+-----+-----+------
>> 1 |   2 |   3 | text
>>
>> (1 rows)
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
>> cassandra@cqlsh> select * from ks.tb2;
>>
>> id | cl1 | cl2 | val
>> ----+-----+-----+------
>> 1 |   2 |   3 | text
>> 1 |   2 |   4 | null
>>
>> -----------------------------------------
>>
>> As you see, we have a hole in which a 'null' value is getting written on
>> column val even if we have a NOT_NULL on that particular column whenever
>> the column is NOT specified on the write. That raises the question on how
>> this particular constraint should behave.
>>
>> If we consider the other constraints (scalar constraint and length
>> constraint so far), this particular behavior is fine. But, if the
>> constraint is NOT_NULL, then it becomes a little bit trickier.
>>
>> The conclusions we have reached is that the meaning of constraints should
>> be interpreted like: I check whatever you give me as part of the write,
>> ignoring everything else. Let me elaborate:
>> If we decide to treat this particular NOT_NULL constraint differently,
>> and check if the value for that column is present in the insert statement,
>> we then open a different can of worms. What happens if the row already
>> exists with a valid value, and that insert statement is only trying to do
>> an update to a different column in the row? If that was the case, we would
>> be forcing the user to specify the 'val' column value for every update,
>> even if it is not needed.
>>
>> Mainly for this reason, we think it is better to treat this NOT_NULL
>> constraint just like the other constraints, and execute it ONLY on the
>> values that are present on the insert statement.
>>
>> The main con is that it may lead to a little bit of confussion (as in,
>> why I just added a null value to the table even if I have a NOT_NULL
>> constraint?). We have thought on aliviating this particular confusion by:
>> - Extensive documentation. Let's be upfront on what this constraint does
>> and does not. (
>> https://github.com/apache/cassandra/blob/ed58c404e8c880b69584e71a3690d3d9f73ef9fa/doc/modules/cassandra/pages/developing/cql/constraints.adoc#not_null-constraint
>> )
>> - Adding, as part of this patch, yet another constraint
>> (STRICTLY_NOT_NULL), that checks for the actual column value to be present
>> in the insert statement..
>>
>> If you've made it until here, that means you are really interested in
>> constraints. Thanks! The question for you is, would you have any concern
>> with this approach?
>>
>> Thanks,
>> Bernardo
>>
>>
>>
>>

Reply via email to