I wonder how often it is that users will apply the constraints on tables
with data while they know their data is probably not compliant with the
constraint configuration. I humbly think that people are aware of this in
advance and what usually happens is that there is some kind of a job which
consolidates the data (or migrates them to a new table) before admins put a
"lid" on that so moving forward nobody puts there anything which would
violate it.

I probably have not kept myself up to date with the discussion but I was
thinking that constraints are effectively there just on the write path.
Whatever is read is not a job of a constraint to refuse to return.

On Tue, Jun 25, 2024 at 9:57 PM Dinesh Joshi <djo...@apache.org> wrote:

> Abe, that's a good point. We need to call out distinct use-cases here.
> When a fresh cluster is set up with constraints we don't have any issues
> because the data written and read back is going to be compliant to the
> constraint(s). For existing data in a cluster where new constraints are
> applied or existing constraints changed in such a way that may render
> existing data unreadable, we need a good user experience. This is what I
> propose –
>
> 1. When a constraint is added or changed in such a way that existing data
> could be rendered unreadable, we should warn the user.
>
> 2. Give the user a choice of whether it is ok for the data to be rendered
> unreadable and an error is issued or a warning should be issued when the
> read violates the constraint but data is still readable. New data going in
> will meet the constraint but old data would need to be rewritten for
> the application to make it compliant.
>
> With this approach the application developer can decide what is right for
> their particular use-case. In many cases the application developer may
> decide to rewrite the data when they see a warning.
>
>
> On Tue, Jun 25, 2024 at 12:46 PM Abe Ratnofsky <a...@aber.io> wrote:
>
>> If we're going to introduce a feature that looks like SQL constraints, we
>> should make sure it's "reasonably" compliant. In particular, we should
>> avoid situations where a user creates a constraint, writes some data, then
>> reads data that violates that constraint, unless they've expressed that
>> violations on read would be acceptable.
>>
>> For Postgres, when adding a new constraint you can specify NOT VALID to
>> avoid scanning all existing relevant data[1]. If we want to avoid
>> scan-on-DDL, this tradeoff needs to be made clear to a user.
>>
>> As we've already discussed, constraints must deal with operations that
>> appear within limits on the write path, but once reconciled on read or
>> during compaction can lead to a violation. Adding to non-frozen collections
>> is one example. Expecting users to understand the write path for
>> collections feels unrealistic to me; I wonder if we should express in the
>> constraint itself that it only applies during write.
>>
>> Anything that uses "nodetool import" (including cassandra-analytics)
>> could theoretically push constraint-violating mutations to a table. We
>> could update import to scan table contents first, or add a flag to trust
>> the data in imported SSTables and make cassandra-analytics executors aware
>> of table-level constraints.
>>
>> Some client implementations read the system_schema tables to build their
>> object mappers, I'd like to confirm that nothing will require clients to be
>> aware of these new schema constructs.
>>
>> Overall, I'm supportive of the distinctions discussed between constraints
>> and guardrails and like the direction this is heading; I'd just like to
>> make sure the more detailed semantics aren't confusing or misleading for
>> our users, and semantics are much harder to change in the future.
>>
>> [1]: https://www.postgresql.org/docs/current/sql-altertable.html
>>
>>

Reply via email to