The reasonings on operator and LLM familiarity are spot on. I have experimented with LLM generated queries. It typically does a noticeably better job on SQL than CQL.
- Yifan On Mon, Aug 11, 2025 at 1:44 PM Patrick McFadin <pmcfa...@gmail.com> wrote: > I really love this CEP. +1 on the goal. > > As you've already seen, I've been advocating to improve our syntax > ergonomics towards more mainstream SQL and avoiding new/custom syntax. I > would suggest the following changes towards that goal: > - Reuse PG-shaped DDL. Keep human text in COMMENT ON[1] (map existing > table comments to that). For structured tags, mirror SECURITY LABEL[2]: > SECURITY LABEL FOR <provider> ON <object> IS '<text>'; > > - Allow multiple providers per object. Store the value as text in v1 (JSON > or key/val later if we want), which avoids inventing new inline @ syntax. > > - Avoid new grammar in CREATE/ALTER. Skipping inline @PII keeps schemas > readable and the grammar simple. Tools can issue COMMENT ON/SECURITY LABEL > right after DDL, like PG users do today. > > - Names & built-ins. Case-insensitive provider names with canonical > lowercase. No separate @Description type. COMMENT ON already covers that > use case cleanly. > > - Introspection by query and by DESC. Keep annotations visible in > DESCRIBE, but also expose a single system_schema.annotations view > (provider, object_type, object_name, sub_name, value) so folks can get all > annotations for a table. Example: “find all columns labeled PII,” etc. > > Why PG-like? Besides operator familiarity, there’s far more training data > and tooling around COMMENT ON/SECURITY LABEL than around bespoke > @annotation syntax. Sticking to that shape reduces LLM/tool friction and > avoids teaching the world a new grammar. This has been a huge challenge for > Cassandra work with LLMs as models tend to drift towards PG SQL in CQL > often. (No Claude, JOIN is not a keyword in Cassandra) > > If this direction sounds good, happy to help update the CEP text and > examples. > > Patrick > > 1: COMMENT ON docs > https://www.postgresql.org/docs/current/sql-comment.html > 2: SECURITY LABEL docs > https://www.postgresql.org/docs/current/sql-security-label.html > > > On Mon, Aug 11, 2025 at 10:18 AM Yifan Cai <yc25c...@gmail.com> wrote: > >> IMO, the full schema or table schema output already makes it possible to >> filter the fields (not limited to columns) that are using certain >> annotations, relatively easily. Grepping or parsing, whichever is more >> suitable for the scenarios; consumers make the call. >> There is not much added value by providing such a dedicated query, >> however, adding quite a lot of complexity in the design of this CEP. Please >> correct me if I have the wrong understanding of the queries. >> >> Another reason for preferring the existing "DESCRIBE" statements is the >> gen-AI enrichment mentioned in the CEP. We most likely want to feed the LLM >> the full (table) schema. >> >> The primary goal is to enrich the schema with annotations. Through the >> discussion thread, we will find out whether there is enough motivation to >> support such queries to filter by annotation. I appreciate that you brought >> up the idea. >> >> Although we are not at the stage of talking about the implementation, >> just sharing my thoughts a bit, I am thinking of the approach (1) that >> Stefan mentioned. >> >> - Yifan >> >> On Mon, Aug 11, 2025 at 6:31 AM Francisco Guerrero <fran...@apache.org> >> wrote: >> >>> Another interesting query would be to retrieve all the fields annotated >>> with PII >>> for example. >>> >>> On 2025/08/11 01:01:21 Yifan Cai wrote: >>> > > >>> > > Will there be an option to do a SELECT query to read all the >>> annotations >>> > > of a table? >>> > >>> > >>> > It is an interesting question! Would you mind sharing an example of the >>> > output you'd expect from a query like *"SELECT * FROM >>> > system_schema.annotations where keyspace_name=<> and table_name=<>"*? >>> I am >>> > curious how that might differ from what we get when running "DESC >>> TABLE". >>> > >>> > - Yifan >>> > >>> > On Sat, Aug 9, 2025 at 9:43 AM Jaydeep Chovatia < >>> chovatia.jayd...@gmail.com> >>> > wrote: >>> > >>> > > >we could explore enriching the syntax with DESCRIBE >>> > > >>> > > Will there be an option to do a SELECT query to read all the >>> annotations >>> > > of a table? Something like *"SELECT * FROM system_schema.annotations >>> > > where keyspace_name=<> and table_name=<>"* >>> > > It would be helpful to have a structured CQL query on top of >>> printing the >>> > > annotations through DESC so that the information can be consumed >>> easily. >>> > > >>> > > Jaydeep >>> > > >>> > > On Fri, Aug 8, 2025 at 11:03 AM Jyothsna Konisa < >>> jyothsna1...@gmail.com> >>> > > wrote: >>> > > >>> > >> Thanks, Joel, for the positive response. >>> > >> >>> > >> 1. User-defined vs. pre-defined annotation types >>> > >> >>> > >> We'd like to have one predefined annotation, Description, but also >>> give >>> > >> users the flexibility to create new ones. If a user feels that a >>> custom >>> > >> annotation like @Desc suits their use case, they should be allowed >>> to use >>> > >> it, as these elements are purely descriptive and have no actions >>> associated >>> > >> with them. >>> > >> >>> > >> 2. Syntactically, is it worth considering other alternatives? >>> > >> >>> > >> You're concerned that having several annotations on multiple columns >>> > >> could make schemas difficult to read. For now, we can have >>> annotations >>> > >> printed as part of DESCRIBE statements. If there's a strong need to >>> > >> suppress annotations for readability, we could explore enriching >>> the syntax >>> > >> with DESCRIBE [FULL] SCHEMA [WITH ANNOTATIONS], similar to the >>> existing >>> > >> DESCRIBE [FULL] SCHEMA. >>> > >> >>> > >> Thanks, >>> > >> Jyothsna >>> > >> >>> > >> On Fri, Aug 8, 2025 at 10:56 AM Jyothsna Konisa < >>> jyothsna1...@gmail.com> >>> > >> wrote: >>> > >> >>> > >>> Thanks, Stefan, for your feedback! >>> > >>> >>> > >>> To answer your questions, >>> > >>> >>> > >>> 1. I agree; annotations can optionally take arguments, and if an >>> > >>> annotation doesn't have an argument, we can skip the arguments in >>> the >>> > >>> "DESCRIBE" statement's output. >>> > >>> >>> > >>> 2. Good point. We originally considered using "ANNOTATED WITH" but >>> found >>> > >>> it too verbose. As an alternative, we proposed using "@" preceding >>> the >>> > >>> annotation to signal it to the parser. We are open to using an >>> explicit >>> > >>> phrase like "ANNOTATED WITH" if you think it would make the code >>> more >>> > >>> readable. >>> > >>> >>> > >>> A full example of annotations along with constraints and masking >>> could >>> > >>> be: >>> > >>> >>> > >>> >>> > >>> CREATE TABLE test_ks.test_table ( >>> > >>> id int PRIMARY KEY, >>> > >>> col2 int CHECK col2 > 0 ANNOTATED WITH @PII AND >>> @DESCRIPTION('this >>> > >>> is column col2') MASKED WITH default() >>> > >>> ); >>> > >>> >>> > >>> OR >>> > >>> >>> > >>> CREATE TABLE test_ks.test_table ( >>> > >>> id int PRIMARY KEY, >>> > >>> col2 int CHECK col2 > 0 @PII AND @DESCRIPTION('this is column >>> col2') >>> > >>> MASKED WITH default() >>> > >>> ); >>> > >>> >>> > >>> >>> > >>> >>> > >>> 3. We do not have a prototype yet, but I think we will have to >>> introduce >>> > >>> new parsing branch for annotations at the table level >>> > >>> >>> > >>> I hope I answered all your questions! >>> > >>> >>> > >>> - Jyothsna >>> > >>> >>> > >>> On Thu, Aug 7, 2025 at 11:36 AM Joel Shepherd <sheph...@amazon.com >>> > >>> > >>> wrote: >>> > >>> >>> > >>>> I like the aim of the CEP. Completely onboard with the idea that >>> GenAI >>> > >>>> tooling works better when you can provide it useful context about >>> the data >>> > >>>> it is working with. An organization I worked with in the past had >>> a lot of >>> > >>>> good results with marking up API models (not DB schemas, but >>> similar idea) >>> > >>>> with authorization-related annotations and using those to drive >>> policy >>> > >>>> linters and end-user interfaces. So, sold on the value of the >>> capability. >>> > >>>> >>> > >>>> Two things I'm less sure of: >>> > >>>> >>> > >>>> 1) User-defined vs pre-defined annotation types: I appreciate the >>> > >>>> flexibility that user-defined annotations appears to give, but it >>> adds >>> > >>>> extra room for error. E.g. if annotation names are >>> case-sensitive, do I >>> > >>>> (the user) have to actively prevent creation of @description? Or, >>> police >>> > >>>> the accidental creation of alternative names like @Desc? If the >>> community >>> > >>>> settled on a small, fixed set of supported annotations, so >>> Cassandra itself >>> > >>>> was authoritative for valid annotation names, would make the >>> feature a lot >>> > >>>> less valuable, or prevent offering user-defined annotations in >>> the future? >>> > >>>> >>> > >>>> 2) Syntactically, is it worth considering other alternatives? I >>> was >>> > >>>> trying to imagine a CREATE TABLE statement marked up with two or >>> three >>> > >>>> types of column-level annotations, and my sense is that it could >>> get hard >>> > >>>> to read quickly. Is it worth considering Javadoc-style >>> annotations in >>> > >>>> schema comments instead? I think in today's world that means that >>> they >>> > >>>> would not be accessible via CQL/Cassandra (CQL comments are not >>> persisted >>> > >>>> as part of the schema, correct?) but they could be accessible to >>> other >>> > >>>> schema-processing tools and IMO be a more readable syntax. It'd >>> be good to >>> > >>>> work through a couple use-cases for actually using the data >>> provided by the >>> > >>>> annotations and get a sense of whether making them first-class >>> entities in >>> > >>>> CQL is necessary for getting most of the value from them. >>> > >>>> >>> > >>>> Thanks -- Joel. >>> > >>>> On 8/6/2025 6:59 PM, Jyothsna Konisa wrote: >>> > >>>> >>> > >>>> Sorry for the incorrect editable link, here is the updated link >>> to the CEP >>> > >>>> 52: Schema Annotations for ApacheCassandra >>> > >>>> < >>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP+52%3A+Schema+Annotations+for+ApacheCassandra >>> > >>> > >>>> >>> > >>>> On Wed, Aug 6, 2025 at 4:26 PM Jyothsna Konisa < >>> jyothsna1...@gmail.com> >>> > >>>> wrote: >>> > >>>> >>> > >>>>> Hello Everyone! >>> > >>>>> >>> > >>>>> We would like to propose CEP 52: Schema Annotations for >>> > >>>>> ApacheCassandra >>> > >>>>> < >>> https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=373887528&draftShareId=339b7f4e-9bc2-45bd-9a80-b0d4215e3f45& >>> > >>> > >>>>> >>> > >>>>> This CEP outlines a plan to introduce *Schema Annotations* as a >>> way >>> > >>>>> to add better context to schema elements. We're also proposing a >>> set of new >>> > >>>>> DDL statements to manage these annotations. >>> > >>>>> >>> > >>>>> We believe these annotations will be highly beneficial for >>> several key >>> > >>>>> areas: >>> > >>>>> >>> > >>>>> - >>> > >>>>> >>> > >>>>> GenAI Applications: Providing more context to LLMs could >>> > >>>>> significantly improve the accuracy and relevance of generated >>> content. >>> > >>>>> - >>> > >>>>> >>> > >>>>> Data Governance: Annotations can help in enforcing policies >>> using >>> > >>>>> annotations >>> > >>>>> - >>> > >>>>> >>> > >>>>> Compliance: They can be used to track and manage compliance >>> > >>>>> requirements directly within the schema. >>> > >>>>> >>> > >>>>> We're eager to hear your thoughts and feedback on this proposal. >>> > >>>>> Please keep the discussion within this mailing thread. >>> > >>>>> >>> > >>>>> Thanks for your time and feedback in advance. >>> > >>>>> >>> > >>>>> Best regards, >>> > >>>>> >>> > >>>>> Jyothsna & Yifan >>> > >>>>> >>> > >>>>> >>> > >>>>> >>> > >>>>> >>> > >>> >>