Re: [DISCUSS] CEP-42: Constraints Framework

Jordan West Sun, 23 Jun 2024 14:39:19 -0700

I am generally for this CEP, particularly the sizeOf guardrail. For
example, we recently had an incident caused by a client who wrote outside
of the contract we had verbally established. The constraint would have let
us encode that contract into the database. In this case, clients are
writing large blobs at the application layer and internally the client
performs chunking.  We had established a chunk size of 64k, for example.
However, the application team wanted to use a different programming
language than the ones we provide clients for so they wrote their own. The
new client had a bug that did not honor the agreed upon chunk size and
wrote chunks that were MBs in size. This eventually led to a production
incident and the issue was discovered as a result of a bunch of analysis
(dumping sstables, etc). Had we had the sizeOf guardrail it would have
turned a production incident with hours of investigation into a bug found
immediately during development. Could this be done with a node-level
guardrail? Likely. But config has the issues described above and its
possible to have two tables with different constraints around similar
fields (for example, two different chunk size configs due to data shape).
Could it be done at the client layer? Yes that's what we are doing now, but
this incident highlights the weakness with that approach (having to
implement the contract everywhere and having disjoint features across
clients).

I also think there is benefit to application owners. Encoding constraints
in the database ensures continuity as ownership and contributors change and
reduces the need for comments or documentation as the means to enforce or
share this knowledge.

I think enforcing them at write time makes sense. Thinking about it in the
scope of compaction for example reminds me of a data loss incident where
someone ran a validation in an older version (like 2.0 or 2.1) and a bunch
of 4 byte ints were thrown away because the field expected an 8 byte long.

My primary concern would be ensuring that we don't implement constraints
that require a read before right (not inList comes to mind as an example of
one that could imply reading before writing and could confuse a user if it
doesn't).

Regarding the conflict with existing guardrails, I do think that is
tougher. On one hand I find this feature to be more evolved than those
guardrails and would be fine to see them be replaced by it. On the other,
the guardrails provide sole control to the operator which is nice but adds
some complexity that has been rightly called out.  But I don't see that as
a reason not to go forward with this feature. We should pick a path and
accept the tradeoffs.

Jordan

On Thu, Jun 13, 2024 at 2:39 PM Bernardo Botella <
[email protected]> wrote:

> Thanks a lot for your comments Abe!
>
> I do agree that the Constraint clause should be as simple as possible. I
> will add a note on the CEP along with some specifics about the proposed
> constraints (removing the ones that are contentious, and adding them to a
> possible future additions section). And yeah, I also think that these
> constraints will help different Cassandra operating paradigms (multi-tenant
> clusters and diverse workflows).
>
> Besides that, I hope that I’ve addressed all the potential concerns and
> feedback on the thread. Let’s let a bit more time for others to chime in
> (any further feedback will be more than welcome), but I’d like to move
> forward with a voting soon if no other concerns are pointed out.
>
> All and all, thanks a lot to everyone that participated in the thread and
> added to the discussion!
> Bernardo
>
>
>
> > On Jun 12, 2024, at 2:37 PM, Abe Ratnofsky <[email protected]> wrote:
> >
> > I've thought about this some more. It would be useful for Cassandra to
> support user-defined "guardrails" (or constraints, whatever you want to
> call them), that could be applied per keyspace or table. Whether a user or
> an operator is considered the owner of a table depends on the organization
> deploying Cassandra, so allowing both parties to protect their tables
> against mis-use seems good to me, especially for large multi-tenant
> clusters with diverse workloads.
> >
> > For example, it would be really useful if a user could set the
> Guardrails.{read,write}ConsistencyLevels for their tables, or declare
> whether all operations should be over LWTs to avoid mixing regular and LWT
> workloads.
> >
> > I'm hesitant about adding lots of expression syntax to the CONSTRAINT
> clause. I think I'd prefer a function calling syntax that represents:
> > 1. Whether the constraint is system / keyspace / table scoped
> > 2. Where in query processing the constraint is checked
> > 3. What is executed by the check
>
>

Re: [DISCUSS] CEP-42: Constraints Framework

Reply via email to