Abe, that's a good point. We need to call out distinct use-cases here. When a fresh cluster is set up with constraints we don't have any issues because the data written and read back is going to be compliant to the constraint(s). For existing data in a cluster where new constraints are applied or existing constraints changed in such a way that may render existing data unreadable, we need a good user experience. This is what I propose –
1. When a constraint is added or changed in such a way that existing data could be rendered unreadable, we should warn the user. 2. Give the user a choice of whether it is ok for the data to be rendered unreadable and an error is issued or a warning should be issued when the read violates the constraint but data is still readable. New data going in will meet the constraint but old data would need to be rewritten for the application to make it compliant. With this approach the application developer can decide what is right for their particular use-case. In many cases the application developer may decide to rewrite the data when they see a warning. On Tue, Jun 25, 2024 at 12:46 PM Abe Ratnofsky <a...@aber.io> wrote: > If we're going to introduce a feature that looks like SQL constraints, we > should make sure it's "reasonably" compliant. In particular, we should > avoid situations where a user creates a constraint, writes some data, then > reads data that violates that constraint, unless they've expressed that > violations on read would be acceptable. > > For Postgres, when adding a new constraint you can specify NOT VALID to > avoid scanning all existing relevant data[1]. If we want to avoid > scan-on-DDL, this tradeoff needs to be made clear to a user. > > As we've already discussed, constraints must deal with operations that > appear within limits on the write path, but once reconciled on read or > during compaction can lead to a violation. Adding to non-frozen collections > is one example. Expecting users to understand the write path for > collections feels unrealistic to me; I wonder if we should express in the > constraint itself that it only applies during write. > > Anything that uses "nodetool import" (including cassandra-analytics) could > theoretically push constraint-violating mutations to a table. We could > update import to scan table contents first, or add a flag to trust the data > in imported SSTables and make cassandra-analytics executors aware of > table-level constraints. > > Some client implementations read the system_schema tables to build their > object mappers, I'd like to confirm that nothing will require clients to be > aware of these new schema constructs. > > Overall, I'm supportive of the distinctions discussed between constraints > and guardrails and like the direction this is heading; I'd just like to > make sure the more detailed semantics aren't confusing or misleading for > our users, and semantics are much harder to change in the future. > > [1]: https://www.postgresql.org/docs/current/sql-altertable.html > >