Re: [DISCUSS] SQL support in Cassandra

Patrick McFadin Tue, 04 Nov 2025 11:43:01 -0800

Just to be clear, in my initial proposal I said that CQL can never go away. 
It’s a life sentence. Knowing the upgrade cycle that many users are on, it will 
be 50 years before we could even try.


I feel we are at a fork here in the discussion. 

Fork 1: Discuss and somehow ratify that we adhere to SQL syntax for new CQL 
features 

Fork 2: Formation of a SIG or new DISCUSS thread on how to add SQL as a formal 
path. There have already been throwing around really good ideas and should 
continue. Josh wrapped it up nicely with a “Stateless layer that could serve 
many purposes” 

That’s my proposal. WDYT?

Patrick

> On Nov 4, 2025, at 11:18 AM, Josh McKenzie <[email protected]> wrote:
> 
> Good point Joey; I was rather focused on the ergonomics of implicit 
> constraint that come with CQL vs. SQL and the gap we'd have to bridge to make 
> a SQL-centric world have the same design language as CQL today.
> 
> We can't afford to drop CQL at this point unless we had an overwhelmingly 
> bullet-proof CQL->SQL translation layer that didn't introduce new edge cases 
> nor performance degradation compared to CQL directly today. Users would have 
> to have the ability for existing CQL applications to Just Work when migrated 
> onto some new paradigm where the existing CQL native protocol endpoints were 
> deprecated. At that point we'd just be weighing the cost of maintaining a 
> translation layer between API semantics vs. a translation layer between the 
> native protocol and the storage engine we already have today; lot of work to 
> just be where we are today IMO.
> 
> We've learned the hard way that when you remove functionality from the 
> database it hurts a lot of users in a lot of ways and we all discussed and 
> broadly had a consensus to try not to remove anything going forward on the 
> dev ML in the past year as I recall. Removing our core query language would 
> be... quite the opposite of what we discussed and agreed to.
> 
> Now - SQL layer on top of the storage engine? If people want to work on that 
> I think it'd be great for our ecosystem. To Chris' point, I think there's 
> probably appetite from users' perspectives to have different APIs to interact 
> with data in the storage engine, be it gRPC, GraphQL, JSON, CQL over REST, 
> CQL, SQL, etc. Us having a layer that allowed us to reasonably build in that 
> functionality would be a net win.
> 
> On Tue, Nov 4, 2025, at 12:36 PM, Chris Lohfink wrote:
>> Just throwing my 2 cents in. I'm probably in the unpopular camp of wanting 
>> to to move the other direction towards a grpc endpoint that is even more 
>> restrictive than cql. This is coming from a standpoint of needing to clean 
>> up after mistakes (application/modeling etc, not cassandra) than the 
>> standpoint of trying to sell people on using the database. I would prefer to 
>> see all the features and endpoints we provide work well without breaking 
>> than make cool demos and feature bullet points. That said I know in order 
>> for a database to be successful we need the cool feature sets as well.  CQL 
>> works for now and deprecating that would be an absolute nightmare for people 
>> already using it (ie thrift migration was not fun for anyone). I say create 
>> a new entrypoint or layer, mark it experimental and allow operators to 
>> disable it but leave the existing CQL interface alone.
>> 
>> Chris
>> 
>> On Tue, Nov 4, 2025 at 10:53 AM Isaac Reath <[email protected] 
>> <mailto:[email protected]>> wrote:
>> I share Joey's opinions on this. Many features that resemble SQL (e.g., 
>> indexes, materialized views) come with caveats that stem from their 
>> implementation details rather than the query language itself. If we expose 
>> these same features through SQL as they are today, I think we'd risk setting 
>> users up for disappointment, since they will come in with implicit 
>> expectations about how a given SQL feature should work based on their 
>> previous experience and more often than not we won't meet that expectation. 
>> At least with CQL we set the expectation that this is a different database, 
>> where familiar concepts might behave differently than you would expect. 
>> 
>> That said, in terms of a long term direction, I think having SQL support is 
>> a good guiding light and implementing it as a stateless component as Jeff 
>> suggests would help make this easier to realize. 
>> 
>> On Tue, Nov 4, 2025 at 10:23 AM Joseph Lynch <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Removing CQL is, in my opinion, completely off the table. When we deprecated 
>> Thrift and gave CQL as the new query language, we imposed significant pain 
>> on our existing functional Thrift applications to migrate to it - I feel we 
>> should not hurt our users like that again.
>> 
>> I worry that we already struggle to implement the current surface area of 
>> CQL correctly and in a way that scales safely. For example, CQL allows us to 
>> create arbitrarily large partitions, but large partitions and large columns 
>> continue to be something our storage engine can't currently handle well. CQL 
>> allows us to create secondary indices for improved filter support but few 
>> can (or at least we struggle) to safely use them in production. We still 
>> struggle with how page timeouts, hedges and retries work in an idempotent 
>> and reliable way in our current protocol - although CQL at least gives us a 
>> path to implementing those.
>> 
>> I wonder if we should focus on being excellent at the basic write and read 
>> operations we already support before adding more complexity at the API 
>> layer. I am excited by the recent proposals around unbounded partitions, 
>> byte ordered partitioner with safe data movement, ability to execute 
>> analytics queries efficiently via a separate columnar representation etc ... 
>> and all of those and more would likely be required to tackle SQL in any 
>> meaningful way.
>> 
>> The surface area of SQL is much much wider, requiring functional 
>> implementation of all of that plus joins, interactive transactions and more. 
>> The SQL protocol itself is also quite poor for reliable communication and 
>> rarely has performant async clients with size based pagination, per page 
>> timeouts, per page hedging, incremental progress over a streaming async 
>> interface, pagination resumption, etc ...  A lot of this difficulty stems 
>> from the protocol often being tied to TCP connections and the inherently 
>> unbounded complexity of the read interface.
>> 
>> I guess I'm saying, I think we should prioritize succeeding at the API scope 
>> we already have before adding more. Deferring to standard SQL syntax or 
>> naming when we can just seems like a good idea (why reinvent concepts), but 
>> I don't think the friction with CQL is because it's not SQL, I think it's 
>> because users can't tell what works and what doesn't work.
>> 
>> -Joey 
>> 
>> On Tue, Nov 4, 2025 at 8:42 AM Josh McKenzie <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> +1 to Mick and Aleksey. I think the key for me was this:
>>> One is Cassandra’s wide-partition model with flexible clustering columns, 
>>> which supports very large, ordered partitions (e.g. time-series and 
>>> efficient range scans), rather than a strictly normalised, join-centric 
>>> model. These patterns don’t always map cleanly to SQL semantics, and CQL’s 
>>> query-driven, table-per-query modelling helps move users toward designs 
>>> that scale predictably.
>> 
>> We'd need really robust EXPLAIN / EXPLAIN ANALYZE support (see here 
>> <https://www.postgresql.org/docs/current/sql-explain.html>) for users to be 
>> able to make sense of how their SQL queries translate into underlying disk 
>> access patterns. Having a wide-open field of full SQL compliance they then 
>> need to understand how to constrain to get horizontal scale out of it would 
>> be much more challenging than the already somewhat "new" cognitive muscle 
>> our users have to build to realize that horizontal scaling of data access 
>> doesn't come free.
>> 
>> I think that would give us a future state of "Use SQL when you need / want a 
>> lot of expressivity, use CQL when you need to be constrained to language 
>> primitives that keep your data access scalable". The part that gets me wary 
>> here is how we've run into pain in the past trying to be both a database 
>> that allows more query expressivity (ALLOW FILTERING, legacy 2i come to 
>> mind) and a database that also wants horizontal scale.
>> 
>> I'd love us to be able to have our cake and eat it too but I don't know if 
>> that's possible. So at the very least I'd advocate for SQL + CQL going 
>> forward, or SQL + a constrained "CQL-like" mode that gives the same 
>> constraints CQL does today on modeling that guide people towards that very 
>> partitionable path.
>> 
>> On Tue, Nov 4, 2025, at 8:12 AM, Aleksey Yeshchenko wrote:
>>> I don’t mind us implementing some Postgres syntax support in some capacity, 
>>> but I do not like the idea of limiting what Cassandra is allowed to do, or 
>>> expose via CQL, to what is expressible by Postgres’s SQL.
>>> 
>>> Many moons ago, before we started work on native protocol and CQL, I could 
>>> perhaps a bigger benefit to going Postgres route - for the client protocol 
>>> and the language. We could piggyback on existing client infrastructure and 
>>> SQL familiarity. But at this stage, when we have already made the effort to 
>>> develop decent drivers, and CQL is fleshed out, and C* is quite mature 
>>> overall, how much would we gain from this transition?
>>> 
>>> I’m broadly with Mick here. And I support using Postgres’ SQL as 
>>> inspiration for implementing new CQL features wherever it makes sense - 
>>> it’s something we’ve been doing for a decade already. But I don’t believe 
>>> that deprecating CQL is the way to go at this point.
>>> 
>>> > On 4 Nov 2025, at 06:38, Mick <[email protected] <mailto:[email protected]>> 
>>> > wrote:
>>> > 
>>> > 
>>> > 
>>> >> On 3 Nov 2025, at 20:32, Joel Shepherd <[email protected] 
>>> >> <mailto:[email protected]>> wrote:
>>> >> 
>>> >> At the same time, my personal opinion is that if SQL compatibility is 
>>> >> pursued, then the end game should be to deprecate CQL. That will 
>>> >> probably take years, but at the limit I don't see a lot of benefit to 
>>> >> supporting both.
>>> > 
>>> > 
>>> > 
>>> > We want SQL, but _why_ (in all its nuances) do we want SQL ?  A lot is 
>>> > obvious, but it is a very broad question.
>>> > 
>>> > The adoption and standardisation benefits are obvious, but CQL has 
>>> > strengths relative to SQL in Cassandra’s context.  
>>> > 
>>> > One is Cassandra’s wide-partition model with flexible clustering columns, 
>>> > which supports very large, ordered partitions (e.g. time-series and 
>>> > efficient range scans), rather than a strictly normalised, join-centric 
>>> > model. These patterns don’t always map cleanly to SQL semantics, and 
>>> > CQL’s query-driven, table-per-query modelling helps move users toward 
>>> > designs that scale predictably.
>>> > 
>>> > I can see CQL continuing as Cassandra’s high-throughput, query-driven 
>>> > DSL, while we pursue SQL compatibility.  I appreciate Dinesh’s ‘lanes’ 
>>> > framing, e.g. eventually default to a SQL interface (with Accord) for the 
>>> > broadest UX, while CQL remains a high-throughput path.
>>> > 
>>> > Should we also be discussing storage-engine implications ?  Cassandra’s 
>>> > LSMT/SSTable design optimises write paths; while a SQL presents a logical 
>>> > view without constraining physical layout; so data on disk stays 
>>> > optimised for dominant access patterns.  I can also see the need to 
>>> > discuss transport vs query languages differences.
>>> > 
>>> > Are we after both SQL's DML and DDL abilities ?  Beyond accessibility and 
>>> > exploration, SQL often comes with mature tooling for schema change 
>>> > management. Cassandra supports online schema changes (e.g., ALTER TABLE), 
>>> > but cross-table/primary-key changes remain constrained. A SQL interface 
>>> > alone won’t ‘solve’ this: it’s about migration tooling and engine 
>>> > capabilities; changing data models at-scale faces separate challenges.
>>> > 
>>> > Especially outside of early-stage apps and ad-hoc exploration I find SQL 
>>> > less interesting and its ergonomics less aligned with Cassandra’s runtime 
>>> > performance model.  That doesn't make me opposed to the endeavour of SQL 
>>> > compatibility, it pushes me on the why question a bit more for alignment 
>>> > clarity to our strengths.
>>> 
>>> 
>> 
>

Re: [DISCUSS] SQL support in Cassandra

Reply via email to