Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

Yifan Cai Mon, 11 Aug 2025 10:35:50 -0700

IMO, the full schema or table schema output already makes it possible to
filter the fields (not limited to columns) that are using certain
annotations, relatively easily. Grepping or parsing, whichever is more
suitable for the scenarios; consumers make the call.
There is not much added value by providing such a dedicated query, however,
adding quite a lot of complexity in the design of this CEP. Please correct
me if I have the wrong understanding of the queries.


Another reason for preferring the existing "DESCRIBE" statements is the
gen-AI enrichment mentioned in the CEP. We most likely want to feed the LLM
the full (table) schema.

The primary goal is to enrich the schema with annotations. Through the
discussion thread, we will find out whether there is enough motivation to
support such queries to filter by annotation. I appreciate that you brought
up the idea.

Although we are not at the stage of talking about the implementation, just
sharing my thoughts a bit, I am thinking of the approach (1) that Stefan
mentioned.

- Yifan

On Mon, Aug 11, 2025 at 6:31 AM Francisco Guerrero <[email protected]>
wrote:

> Another interesting query would be to retrieve all the fields annotated
> with PII
> for example.
>
> On 2025/08/11 01:01:21 Yifan Cai wrote:
> > >
> > > Will there be an option to do a SELECT query to read all the
> annotations
> > > of a table?
> >
> >
> > It is an interesting question! Would you mind sharing an example of the
> > output you'd expect from a query like *"SELECT * FROM
> > system_schema.annotations where keyspace_name=<> and table_name=<>"*? I
> am
> > curious how that might differ from what we get when running "DESC TABLE".
> >
> > - Yifan
> >
> > On Sat, Aug 9, 2025 at 9:43 AM Jaydeep Chovatia <
> [email protected]>
> > wrote:
> >
> > > >we could explore enriching the syntax with DESCRIBE
> > >
> > > Will there be an option to do a SELECT query to read all the
> annotations
> > > of a table? Something like *"SELECT * FROM system_schema.annotations
> > > where keyspace_name=<> and table_name=<>"*
> > > It would be helpful to have a structured CQL query on top of printing
> the
> > > annotations through DESC so that the information can be consumed
> easily.
> > >
> > > Jaydeep
> > >
> > > On Fri, Aug 8, 2025 at 11:03 AM Jyothsna Konisa <
> [email protected]>
> > > wrote:
> > >
> > >> Thanks, Joel, for the positive response.
> > >>
> > >> 1. User-defined vs. pre-defined annotation types
> > >>
> > >> We'd like to have one predefined annotation, Description, but also
> give
> > >> users the flexibility to create new ones. If a user feels that a
> custom
> > >> annotation like @Desc suits their use case, they should be allowed to
> use
> > >> it, as these elements are purely descriptive and have no actions
> associated
> > >> with them.
> > >>
> > >> 2. Syntactically, is it worth considering other alternatives?
> > >>
> > >> You're concerned that having several annotations on multiple columns
> > >> could make schemas difficult to read. For now, we can have annotations
> > >> printed as part of DESCRIBE statements. If there's a strong need to
> > >> suppress annotations for readability, we could explore enriching the
> syntax
> > >> with DESCRIBE [FULL] SCHEMA [WITH ANNOTATIONS], similar to the
> existing
> > >> DESCRIBE [FULL] SCHEMA.
> > >>
> > >> Thanks,
> > >> Jyothsna
> > >>
> > >> On Fri, Aug 8, 2025 at 10:56 AM Jyothsna Konisa <
> [email protected]>
> > >> wrote:
> > >>
> > >>> Thanks, Stefan, for your feedback!
> > >>>
> > >>> To answer your questions,
> > >>>
> > >>> 1. I agree; annotations can optionally take arguments, and if an
> > >>> annotation doesn't have an argument, we can skip the arguments in the
> > >>> "DESCRIBE" statement's output.
> > >>>
> > >>> 2. Good point. We originally considered using "ANNOTATED WITH" but
> found
> > >>> it too verbose. As an alternative, we proposed using "@" preceding
> the
> > >>> annotation to signal it to the parser. We are open to using an
> explicit
> > >>> phrase like "ANNOTATED WITH" if you think it would make the code more
> > >>> readable.
> > >>>
> > >>> A full example of annotations along with constraints and masking
> could
> > >>> be:
> > >>>
> > >>>
> > >>> CREATE TABLE test_ks.test_table (
> > >>>     id int PRIMARY KEY,
> > >>>     col2 int CHECK col2 > 0 ANNOTATED WITH @PII AND
> @DESCRIPTION('this
> > >>> is column col2') MASKED WITH default()
> > >>> );
> > >>>
> > >>> OR
> > >>>
> > >>> CREATE TABLE test_ks.test_table (
> > >>>     id int PRIMARY KEY,
> > >>>     col2 int CHECK col2 > 0 @PII AND @DESCRIPTION('this is column
> col2')
> > >>> MASKED WITH default()
> > >>> );
> > >>>
> > >>>
> > >>>
> > >>> 3. We do not have a prototype yet, but I think we will have to
> introduce
> > >>> new parsing branch for annotations at the table level
> > >>>
> > >>> I hope I answered all your questions!
> > >>>
> > >>> - Jyothsna
> > >>>
> > >>> On Thu, Aug 7, 2025 at 11:36 AM Joel Shepherd <[email protected]>
> > >>> wrote:
> > >>>
> > >>>> I like the aim of the CEP. Completely onboard with the idea that
> GenAI
> > >>>> tooling works better when you can provide it useful context about
> the data
> > >>>> it is working with. An organization I worked with in the past had a
> lot of
> > >>>> good results with marking up API models (not DB schemas, but
> similar idea)
> > >>>> with authorization-related annotations and using those to drive
> policy
> > >>>> linters and end-user interfaces. So, sold on the value of the
> capability.
> > >>>>
> > >>>> Two things I'm less sure of:
> > >>>>
> > >>>> 1) User-defined vs pre-defined annotation types: I appreciate the
> > >>>> flexibility that user-defined annotations appears to give, but it
> adds
> > >>>> extra room for error. E.g. if annotation names are case-sensitive,
> do I
> > >>>> (the user) have to actively prevent creation of @description? Or,
> police
> > >>>> the accidental creation of alternative names like @Desc? If the
> community
> > >>>> settled on a small, fixed set of supported annotations, so
> Cassandra itself
> > >>>> was authoritative for valid annotation names, would make the
> feature a lot
> > >>>> less valuable, or prevent offering user-defined annotations in the
> future?
> > >>>>
> > >>>> 2) Syntactically, is it worth considering other alternatives? I was
> > >>>> trying to imagine a CREATE TABLE statement marked up with two or
> three
> > >>>> types of column-level annotations, and my sense is that it could
> get hard
> > >>>> to read quickly. Is it worth considering Javadoc-style annotations
> in
> > >>>> schema comments instead? I think in today's world that means that
> they
> > >>>> would not be accessible via CQL/Cassandra (CQL comments are not
> persisted
> > >>>> as part of the schema, correct?) but they could be accessible to
> other
> > >>>> schema-processing tools and IMO be a more readable syntax. It'd be
> good to
> > >>>> work through a couple use-cases for actually using the data
> provided by the
> > >>>> annotations and get a sense of whether making them first-class
> entities in
> > >>>> CQL is necessary for getting most of the value from them.
> > >>>>
> > >>>> Thanks -- Joel.
> > >>>> On 8/6/2025 6:59 PM, Jyothsna Konisa wrote:
> > >>>>
> > >>>> Sorry for the incorrect editable link, here is the updated link to
> the CEP
> > >>>> 52: Schema Annotations for ApacheCassandra
> > >>>> <
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP+52%3A+Schema+Annotations+for+ApacheCassandra
> >
> > >>>>
> > >>>> On Wed, Aug 6, 2025 at 4:26 PM Jyothsna Konisa <
> [email protected]>
> > >>>> wrote:
> > >>>>
> > >>>>> Hello Everyone!
> > >>>>>
> > >>>>> We would like to propose CEP 52: Schema Annotations for
> > >>>>> ApacheCassandra
> > >>>>> <
> https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=373887528&draftShareId=339b7f4e-9bc2-45bd-9a80-b0d4215e3f45&;
> >
> > >>>>>
> > >>>>> This CEP outlines a plan to introduce *Schema Annotations* as a way
> > >>>>> to add better context to schema elements. We're also proposing a
> set of new
> > >>>>> DDL statements to manage these annotations.
> > >>>>>
> > >>>>> We believe these annotations will be highly beneficial for several
> key
> > >>>>> areas:
> > >>>>>
> > >>>>>    -
> > >>>>>
> > >>>>>    GenAI Applications: Providing more context to LLMs could
> > >>>>>    significantly improve the accuracy and relevance of generated
> content.
> > >>>>>    -
> > >>>>>
> > >>>>>    Data Governance: Annotations can help in enforcing policies
> using
> > >>>>>    annotations
> > >>>>>    -
> > >>>>>
> > >>>>>    Compliance: They can be used to track and manage compliance
> > >>>>>    requirements directly within the schema.
> > >>>>>
> > >>>>> We're eager to hear your thoughts and feedback on this proposal.
> > >>>>> Please keep the discussion within this mailing thread.
> > >>>>>
> > >>>>> Thanks for your time and feedback in advance.
> > >>>>>
> > >>>>> Best regards,
> > >>>>>
> > >>>>> Jyothsna & Yifan
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> >
>

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

Reply via email to