One more point I would like to add. If we enrich the output with comments, I think that seeing comments should be only default if I can take what DESCRIBE prints and I can copy it as-is and create tables from it. Very often, DESCRIBE acts as something like "I will copy this schema here so I can reconstruct it later". So I would expect that, by default, what DESCRIBE gives is "reconstructable". I think there are a lot of tests already which tests what DESCRIBE prints can be reconstructed and this would need to be preserved.
We might still do "DESCRIBE ks.tb" without comments / annotations and then "DESCRIBE ks.tb WITH COMMENTS / ANNOTATIONS" to print them. If we put comments on this it is "reconstructable by copy-pasting" as well: create table ks.tb ( -- my primary key column id int primary key, -- this is my value val text ) however this is not create table ks.tb ( /** my primary key column */ id int primary key, val text ) you got me ... Also, if we start to automatically enrich DESCRIBE output, it would be very nice if this was digestible by previous versions. Because if I copy DESCRIBE output in 5.1 with @PII then I can not just apply that to 5.0 where that concept is not known yet. However plain comments do work in previous versions as well. For this reason I would not make annotations visible by default, I would opt-in by WITH COMMENTS / WITH ANNOTATIONS only and keep the current output as is. On Tue, Aug 12, 2025 at 10:56 AM Mick <m...@apache.org> wrote: > a point of order and a reminder: aside from suggestions that the CEP > author is free to adopt or not, anything that's assuming to steer what the > CEP should be should be accompanied with the willingness to commit in > helping making it happen. we want to work as a meritocracy: those that > lead the work have the say, and blocking their chosen approach against > their wishes is only on clear technical reasons. API designs (CQL > additions) always needs to be chosen and evolved carefully, and every CEP > proposed should be open to that being naturally part of its discussion > pre-vote. > > following the PG approach does make a lot of sense. > what are your thoughts on it Jyothsna & Yifan ? > > > > > On 12 Aug 2025, at 09:14, Štefan Miklošovič <smikloso...@apache.org> > wrote: > > > > I like the idea of COMMENT ON and alike from PG! Yes, great stuff, as we > do not invent anything custom and we will be as close as possible to > industry standard. > > > > So, if I understand this correctly, on COMMENT ON, we would save each > comment to a dedicated table. Then on DESCRIBE, we would "enrich" the CQL > element we are describing with commentary, if any, from that comment table, > correct? > > > > I, in general, support this idea, but as usual the devil is in the > details. I am just genuinely curious how this would work in practice. > > > > > > If we go with COMMENT ON, is this going to be stored to TCM or not? > > > > > > If the answer is yes, then it is way more simpler, because then this > commentary would be dispersed by the means of TCM and each node would apply > this transformation locally to system_schema.annotations. > > > > If the answer is no and if there is a cluster and we do COMMENT ON, then > this comment has to be saved to a table. If we rule out TCM as a vehicle > for the dispersion of these comments, that comment table has to be > distributed / replicated, correct? I do not think that we can create that > table under system_schema then, as that is on LocalStrategy and all > modifications to that are, as I understand it, done via TCM? > > > > Hence, I guess the better place for that is under system_distributed? > That means that if somebody changes that keyspace to NTS or nodes are not > available, we will not be able to create any commentary. > > > > Also, if we remove / alter anything, like dropping a keyspace, table, > index, removing column etc ... all these changes would need to also remove > respective comments from that table etc etc. > > > > For these reasons, I think that having dedicated > system_schema.annotations table while interacting with it via COMMENT ON to > be "PG-compatible" so people can query that table directly, and backing > COMMENT ON by TCM by having it as another transformation (as COMMENT ON is > inherently part of the schema) is the best way to do this. > > > > On Mon, Aug 11, 2025 at 10:55 PM Patrick McFadin <pmcfa...@gmail.com> > wrote: > > One (of many) reasons I'm advocating we migrate away from CQL. It served > a purpose at the time, but this project is evolving and this to me seems > like the logical next iteration. The Cassandra project has built it's > reputation on what it can do, not clever syntax design. ;) > > > > Patrick > > > > On Mon, Aug 11, 2025 at 1:51 PM Yifan Cai <yc25c...@gmail.com> wrote: > > The reasonings on operator and LLM familiarity are spot on. > > > > I have experimented with LLM generated queries. It typically does a > noticeably better job on SQL than CQL. > > > > - Yifan > > > > On Mon, Aug 11, 2025 at 1:44 PM Patrick McFadin <pmcfa...@gmail.com> > wrote: > > I really love this CEP. +1 on the goal. > > > > As you've already seen, I've been advocating to improve our syntax > ergonomics towards more mainstream SQL and avoiding new/custom syntax. I > would suggest the following changes towards that goal: > > - Reuse PG-shaped DDL. Keep human text in COMMENT ON[1] (map existing > table comments to that). For structured tags, mirror SECURITY LABEL[2]: > > SECURITY LABEL FOR <provider> ON <object> IS '<text>'; > > > > - Allow multiple providers per object. Store the value as text in v1 > (JSON or key/val later if we want), which avoids inventing new inline @ > syntax. > > > > - Avoid new grammar in CREATE/ALTER. Skipping inline @PII keeps schemas > readable and the grammar simple. Tools can issue COMMENT ON/SECURITY LABEL > right after DDL, like PG users do today. > > > > - Names & built-ins. Case-insensitive provider names with canonical > lowercase. No separate @Description type. COMMENT ON already covers that > use case cleanly. > > > > - Introspection by query and by DESC. Keep annotations visible in > DESCRIBE, but also expose a single system_schema.annotations view > (provider, object_type, object_name, sub_name, value) so folks can get all > annotations for a table. Example: “find all columns labeled PII,” etc. > > > > Why PG-like? Besides operator familiarity, there’s far more training > data and tooling around COMMENT ON/SECURITY LABEL than around bespoke > @annotation syntax. Sticking to that shape reduces LLM/tool friction and > avoids teaching the world a new grammar. This has been a huge challenge for > Cassandra work with LLMs as models tend to drift towards PG SQL in CQL > often. (No Claude, JOIN is not a keyword in Cassandra) > > > > If this direction sounds good, happy to help update the CEP text and > examples. > > > > Patrick > > > > 1: COMMENT ON docs > https://www.postgresql.org/docs/current/sql-comment.html > > 2: SECURITY LABEL docs > https://www.postgresql.org/docs/current/sql-security-label.html > > > > > > On Mon, Aug 11, 2025 at 10:18 AM Yifan Cai <yc25c...@gmail.com> wrote: > > IMO, the full schema or table schema output already makes it possible to > filter the fields (not limited to columns) that are using certain > annotations, relatively easily. Grepping or parsing, whichever is more > suitable for the scenarios; consumers make the call. > > There is not much added value by providing such a dedicated query, > however, adding quite a lot of complexity in the design of this CEP. Please > correct me if I have the wrong understanding of the queries. > > > > Another reason for preferring the existing "DESCRIBE" statements is the > gen-AI enrichment mentioned in the CEP. We most likely want to feed the LLM > the full (table) schema. > > > > The primary goal is to enrich the schema with annotations. Through the > discussion thread, we will find out whether there is enough motivation to > support such queries to filter by annotation. I appreciate that you brought > up the idea. > > > > Although we are not at the stage of talking about the implementation, > just sharing my thoughts a bit, I am thinking of the approach (1) that > Stefan mentioned. > > > > - Yifan > > > > On Mon, Aug 11, 2025 at 6:31 AM Francisco Guerrero <fran...@apache.org> > wrote: > > Another interesting query would be to retrieve all the fields annotated > with PII > > for example. > > > > On 2025/08/11 01:01:21 Yifan Cai wrote: > > > > > > > > Will there be an option to do a SELECT query to read all the > annotations > > > > of a table? > > > > > > > > > It is an interesting question! Would you mind sharing an example of the > > > output you'd expect from a query like *"SELECT * FROM > > > system_schema.annotations where keyspace_name=<> and table_name=<>"*? > I am > > > curious how that might differ from what we get when running "DESC > TABLE". > > > > > > - Yifan > > > > > > On Sat, Aug 9, 2025 at 9:43 AM Jaydeep Chovatia < > chovatia.jayd...@gmail.com> > > > wrote: > > > > > > > >we could explore enriching the syntax with DESCRIBE > > > > > > > > Will there be an option to do a SELECT query to read all the > annotations > > > > of a table? Something like *"SELECT * FROM system_schema.annotations > > > > where keyspace_name=<> and table_name=<>"* > > > > It would be helpful to have a structured CQL query on top of > printing the > > > > annotations through DESC so that the information can be consumed > easily. > > > > > > > > Jaydeep > > > > > > > > On Fri, Aug 8, 2025 at 11:03 AM Jyothsna Konisa < > jyothsna1...@gmail.com> > > > > wrote: > > > > > > > >> Thanks, Joel, for the positive response. > > > >> > > > >> 1. User-defined vs. pre-defined annotation types > > > >> > > > >> We'd like to have one predefined annotation, Description, but also > give > > > >> users the flexibility to create new ones. If a user feels that a > custom > > > >> annotation like @Desc suits their use case, they should be allowed > to use > > > >> it, as these elements are purely descriptive and have no actions > associated > > > >> with them. > > > >> > > > >> 2. Syntactically, is it worth considering other alternatives? > > > >> > > > >> You're concerned that having several annotations on multiple columns > > > >> could make schemas difficult to read. For now, we can have > annotations > > > >> printed as part of DESCRIBE statements. If there's a strong need to > > > >> suppress annotations for readability, we could explore enriching > the syntax > > > >> with DESCRIBE [FULL] SCHEMA [WITH ANNOTATIONS], similar to the > existing > > > >> DESCRIBE [FULL] SCHEMA. > > > >> > > > >> Thanks, > > > >> Jyothsna > > > >> > > > >> On Fri, Aug 8, 2025 at 10:56 AM Jyothsna Konisa < > jyothsna1...@gmail.com> > > > >> wrote: > > > >> > > > >>> Thanks, Stefan, for your feedback! > > > >>> > > > >>> To answer your questions, > > > >>> > > > >>> 1. I agree; annotations can optionally take arguments, and if an > > > >>> annotation doesn't have an argument, we can skip the arguments in > the > > > >>> "DESCRIBE" statement's output. > > > >>> > > > >>> 2. Good point. We originally considered using "ANNOTATED WITH" but > found > > > >>> it too verbose. As an alternative, we proposed using "@" preceding > the > > > >>> annotation to signal it to the parser. We are open to using an > explicit > > > >>> phrase like "ANNOTATED WITH" if you think it would make the code > more > > > >>> readable. > > > >>> > > > >>> A full example of annotations along with constraints and masking > could > > > >>> be: > > > >>> > > > >>> > > > >>> CREATE TABLE test_ks.test_table ( > > > >>> id int PRIMARY KEY, > > > >>> col2 int CHECK col2 > 0 ANNOTATED WITH @PII AND > @DESCRIPTION('this > > > >>> is column col2') MASKED WITH default() > > > >>> ); > > > >>> > > > >>> OR > > > >>> > > > >>> CREATE TABLE test_ks.test_table ( > > > >>> id int PRIMARY KEY, > > > >>> col2 int CHECK col2 > 0 @PII AND @DESCRIPTION('this is column > col2') > > > >>> MASKED WITH default() > > > >>> ); > > > >>> > > > >>> > > > >>> > > > >>> 3. We do not have a prototype yet, but I think we will have to > introduce > > > >>> new parsing branch for annotations at the table level > > > >>> > > > >>> I hope I answered all your questions! > > > >>> > > > >>> - Jyothsna > > > >>> > > > >>> On Thu, Aug 7, 2025 at 11:36 AM Joel Shepherd <sheph...@amazon.com > > > > > >>> wrote: > > > >>> > > > >>>> I like the aim of the CEP. Completely onboard with the idea that > GenAI > > > >>>> tooling works better when you can provide it useful context about > the data > > > >>>> it is working with. An organization I worked with in the past had > a lot of > > > >>>> good results with marking up API models (not DB schemas, but > similar idea) > > > >>>> with authorization-related annotations and using those to drive > policy > > > >>>> linters and end-user interfaces. So, sold on the value of the > capability. > > > >>>> > > > >>>> Two things I'm less sure of: > > > >>>> > > > >>>> 1) User-defined vs pre-defined annotation types: I appreciate the > > > >>>> flexibility that user-defined annotations appears to give, but it > adds > > > >>>> extra room for error. E.g. if annotation names are > case-sensitive, do I > > > >>>> (the user) have to actively prevent creation of @description? Or, > police > > > >>>> the accidental creation of alternative names like @Desc? If the > community > > > >>>> settled on a small, fixed set of supported annotations, so > Cassandra itself > > > >>>> was authoritative for valid annotation names, would make the > feature a lot > > > >>>> less valuable, or prevent offering user-defined annotations in > the future? > > > >>>> > > > >>>> 2) Syntactically, is it worth considering other alternatives? I > was > > > >>>> trying to imagine a CREATE TABLE statement marked up with two or > three > > > >>>> types of column-level annotations, and my sense is that it could > get hard > > > >>>> to read quickly. Is it worth considering Javadoc-style > annotations in > > > >>>> schema comments instead? I think in today's world that means that > they > > > >>>> would not be accessible via CQL/Cassandra (CQL comments are not > persisted > > > >>>> as part of the schema, correct?) but they could be accessible to > other > > > >>>> schema-processing tools and IMO be a more readable syntax. It'd > be good to > > > >>>> work through a couple use-cases for actually using the data > provided by the > > > >>>> annotations and get a sense of whether making them first-class > entities in > > > >>>> CQL is necessary for getting most of the value from them. > > > >>>> > > > >>>> Thanks -- Joel. > > > >>>> On 8/6/2025 6:59 PM, Jyothsna Konisa wrote: > > > >>>> > > > >>>> Sorry for the incorrect editable link, here is the updated link > to the CEP > > > >>>> 52: Schema Annotations for ApacheCassandra > > > >>>> < > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP+52%3A+Schema+Annotations+for+ApacheCassandra > > > > > >>>> > > > >>>> On Wed, Aug 6, 2025 at 4:26 PM Jyothsna Konisa < > jyothsna1...@gmail.com> > > > >>>> wrote: > > > >>>> > > > >>>>> Hello Everyone! > > > >>>>> > > > >>>>> We would like to propose CEP 52: Schema Annotations for > > > >>>>> ApacheCassandra > > > >>>>> < > https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=373887528&draftShareId=339b7f4e-9bc2-45bd-9a80-b0d4215e3f45& > > > > > >>>>> > > > >>>>> This CEP outlines a plan to introduce *Schema Annotations* as a > way > > > >>>>> to add better context to schema elements. We're also proposing a > set of new > > > >>>>> DDL statements to manage these annotations. > > > >>>>> > > > >>>>> We believe these annotations will be highly beneficial for > several key > > > >>>>> areas: > > > >>>>> > > > >>>>> - > > > >>>>> > > > >>>>> GenAI Applications: Providing more context to LLMs could > > > >>>>> significantly improve the accuracy and relevance of generated > content. > > > >>>>> - > > > >>>>> > > > >>>>> Data Governance: Annotations can help in enforcing policies > using > > > >>>>> annotations > > > >>>>> - > > > >>>>> > > > >>>>> Compliance: They can be used to track and manage compliance > > > >>>>> requirements directly within the schema. > > > >>>>> > > > >>>>> We're eager to hear your thoughts and feedback on this proposal. > > > >>>>> Please keep the discussion within this mailing thread. > > > >>>>> > > > >>>>> Thanks for your time and feedback in advance. > > > >>>>> > > > >>>>> Best regards, > > > >>>>> > > > >>>>> Jyothsna & Yifan > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > > >