Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

Štefan Miklošovič Mon, 11 Aug 2025 02:41:38 -0700

I think there are in theory at least two ways how you could model this
(maybe there is more?)


1) Serialize these annotations and save them as part of TCM as part of
ColumnMetadata.

and additionally

2) have a dedicated table of them, like masks (look into
addTableToSchemaMutation and addColumnToSchemaMutation) - what this does is
that it will look if there are any masks on a column and if there are,
there will be new mutation added which will record this fact to
system_schema.column_masks. Similar concept is used for e.g. triggers,
dropped columns and so on, so there is a dedicated table for each of these.
Hence you could have system_schema.annotations which would be populated
accordingly.

For 1), when it comes to annotations on columns, you might just follow how
it was done for CEP-42 / constraints. For annotations on other elements,
(keyspaces, tables themselves etc), I am not completely sure about that as
it is way more involved if you go to cover this in such depth and it would
need further investigation.

for 2), there would need to be some common table for every element which
can be annotated otherwise we would have to have an "annotation table per
cql element" and I do not think that is necessary. On the other hand I do
not know how the schema of such a table would look like, because some
entries would have keyspace and table, some only keyspace (keyspace
itself), also functions and aggregates do not have a table assigned to them
etc.

Annotations might be a column of map<text, text>. If you wanted to be an
UDT, you could not save it into a virtual table as that is not supported
yet (1)

What Jaydeep is suggesting makes sense. For example, for now, Jon's MCP
server looks into system_views and tells it to look into it so his thing
"learns" what is going on inside Cassandra based on the content of these
tables.

If we had a table with annotations, he could just point it there and be
done with it - it knows about all the annotations suddenly.

If it was visible only from DESCRIBE ... how that would look, like, what
would be parsed? Does it mean that  you would need to do "DESCRIBE
KEYSPACE" for each keyspace there is and then somehow learn how to parse
annotations?

It would be also cool to just scan one table, programmatically, and you
would know what annotations there are. If it was just in DESCRIBE, how
would you know where all your PII fields fast (or if there are any such
annotations?)

(1) https://issues.apache.org/jira/browse/CASSANDRA-19560

On Mon, Aug 11, 2025 at 3:03 AM Yifan Cai <[email protected]> wrote:

> Will there be an option to do a SELECT query to read all the annotations
>> of a table?
>
>
> It is an interesting question! Would you mind sharing an example of the
> output you'd expect from a query like *"SELECT * FROM
> system_schema.annotations where keyspace_name=<> and table_name=<>"*? I
> am curious how that might differ from what we get when running "DESC TABLE".
>
> - Yifan
>
> On Sat, Aug 9, 2025 at 9:43 AM Jaydeep Chovatia <
> [email protected]> wrote:
>
>> >we could explore enriching the syntax with DESCRIBE
>>
>> Will there be an option to do a SELECT query to read all the annotations
>> of a table? Something like *"SELECT * FROM system_schema.annotations
>> where keyspace_name=<> and table_name=<>"*
>> It would be helpful to have a structured CQL query on top of printing the
>> annotations through DESC so that the information can be consumed easily.
>>
>> Jaydeep
>>
>> On Fri, Aug 8, 2025 at 11:03 AM Jyothsna Konisa <[email protected]>
>> wrote:
>>
>>> Thanks, Joel, for the positive response.
>>>
>>> 1. User-defined vs. pre-defined annotation types
>>>
>>> We'd like to have one predefined annotation, Description, but also give
>>> users the flexibility to create new ones. If a user feels that a custom
>>> annotation like @Desc suits their use case, they should be allowed to use
>>> it, as these elements are purely descriptive and have no actions associated
>>> with them.
>>>
>>> 2. Syntactically, is it worth considering other alternatives?
>>>
>>> You're concerned that having several annotations on multiple columns
>>> could make schemas difficult to read. For now, we can have annotations
>>> printed as part of DESCRIBE statements. If there's a strong need to
>>> suppress annotations for readability, we could explore enriching the syntax
>>> with DESCRIBE [FULL] SCHEMA [WITH ANNOTATIONS], similar to the existing
>>> DESCRIBE [FULL] SCHEMA.
>>>
>>> Thanks,
>>> Jyothsna
>>>
>>> On Fri, Aug 8, 2025 at 10:56 AM Jyothsna Konisa <[email protected]>
>>> wrote:
>>>
>>>> Thanks, Stefan, for your feedback!
>>>>
>>>> To answer your questions,
>>>>
>>>> 1. I agree; annotations can optionally take arguments, and if an
>>>> annotation doesn't have an argument, we can skip the arguments in the
>>>> "DESCRIBE" statement's output.
>>>>
>>>> 2. Good point. We originally considered using "ANNOTATED WITH" but
>>>> found it too verbose. As an alternative, we proposed using "@" preceding
>>>> the annotation to signal it to the parser. We are open to using an explicit
>>>> phrase like "ANNOTATED WITH" if you think it would make the code more
>>>> readable.
>>>>
>>>> A full example of annotations along with constraints and masking could
>>>> be:
>>>>
>>>>
>>>> CREATE TABLE test_ks.test_table (
>>>>     id int PRIMARY KEY,
>>>>     col2 int CHECK col2 > 0 ANNOTATED WITH @PII AND @DESCRIPTION('this
>>>> is column col2') MASKED WITH default()
>>>> );
>>>>
>>>> OR
>>>>
>>>> CREATE TABLE test_ks.test_table (
>>>>     id int PRIMARY KEY,
>>>>     col2 int CHECK col2 > 0 @PII AND @DESCRIPTION('this is column
>>>> col2') MASKED WITH default()
>>>> );
>>>>
>>>>
>>>>
>>>> 3. We do not have a prototype yet, but I think we will have to
>>>> introduce new parsing branch for annotations at the table level
>>>>
>>>> I hope I answered all your questions!
>>>>
>>>> - Jyothsna
>>>>
>>>> On Thu, Aug 7, 2025 at 11:36 AM Joel Shepherd <[email protected]>
>>>> wrote:
>>>>
>>>>> I like the aim of the CEP. Completely onboard with the idea that GenAI
>>>>> tooling works better when you can provide it useful context about the data
>>>>> it is working with. An organization I worked with in the past had a lot of
>>>>> good results with marking up API models (not DB schemas, but similar idea)
>>>>> with authorization-related annotations and using those to drive policy
>>>>> linters and end-user interfaces. So, sold on the value of the capability.
>>>>>
>>>>> Two things I'm less sure of:
>>>>>
>>>>> 1) User-defined vs pre-defined annotation types: I appreciate the
>>>>> flexibility that user-defined annotations appears to give, but it adds
>>>>> extra room for error. E.g. if annotation names are case-sensitive, do I
>>>>> (the user) have to actively prevent creation of @description? Or, police
>>>>> the accidental creation of alternative names like @Desc? If the community
>>>>> settled on a small, fixed set of supported annotations, so Cassandra 
>>>>> itself
>>>>> was authoritative for valid annotation names, would make the feature a lot
>>>>> less valuable, or prevent offering user-defined annotations in the future?
>>>>>
>>>>> 2) Syntactically, is it worth considering other alternatives? I was
>>>>> trying to imagine a CREATE TABLE statement marked up with two or three
>>>>> types of column-level annotations, and my sense is that it could get hard
>>>>> to read quickly. Is it worth considering Javadoc-style annotations in
>>>>> schema comments instead? I think in today's world that means that they
>>>>> would not be accessible via CQL/Cassandra (CQL comments are not persisted
>>>>> as part of the schema, correct?) but they could be accessible to other
>>>>> schema-processing tools and IMO be a more readable syntax. It'd be good to
>>>>> work through a couple use-cases for actually using the data provided by 
>>>>> the
>>>>> annotations and get a sense of whether making them first-class entities in
>>>>> CQL is necessary for getting most of the value from them.
>>>>>
>>>>> Thanks -- Joel.
>>>>> On 8/6/2025 6:59 PM, Jyothsna Konisa wrote:
>>>>>
>>>>> Sorry for the incorrect editable link, here is the updated link to the CEP
>>>>> 52: Schema Annotations for ApacheCassandra
>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP+52%3A+Schema+Annotations+for+ApacheCassandra>
>>>>>
>>>>> On Wed, Aug 6, 2025 at 4:26 PM Jyothsna Konisa <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hello Everyone!
>>>>>>
>>>>>> We would like to propose CEP 52: Schema Annotations for
>>>>>> ApacheCassandra
>>>>>> <https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=373887528&draftShareId=339b7f4e-9bc2-45bd-9a80-b0d4215e3f45&;>
>>>>>>
>>>>>> This CEP outlines a plan to introduce *Schema Annotations* as a way
>>>>>> to add better context to schema elements. We're also proposing a set of 
>>>>>> new
>>>>>> DDL statements to manage these annotations.
>>>>>>
>>>>>> We believe these annotations will be highly beneficial for several
>>>>>> key areas:
>>>>>>
>>>>>>    -
>>>>>>
>>>>>>    GenAI Applications: Providing more context to LLMs could
>>>>>>    significantly improve the accuracy and relevance of generated content.
>>>>>>    -
>>>>>>
>>>>>>    Data Governance: Annotations can help in enforcing policies using
>>>>>>    annotations
>>>>>>    -
>>>>>>
>>>>>>    Compliance: They can be used to track and manage compliance
>>>>>>    requirements directly within the schema.
>>>>>>
>>>>>> We're eager to hear your thoughts and feedback on this proposal.
>>>>>> Please keep the discussion within this mailing thread.
>>>>>>
>>>>>> Thanks for your time and feedback in advance.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Jyothsna & Yifan
>>>>>>
>>>>>>
>>>>>>
>>>>>>

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

Reply via email to