Re: [DISCUSS] SQL support in Cassandra

Josh McKenzie Tue, 04 Nov 2025 11:18:49 -0800

Good point Joey; I was rather focused on the ergonomics of implicit constraint 
that come with CQL vs. SQL and the gap we'd have to bridge to make a 
SQL-centric world have the same design language as CQL today.


We can't afford to drop CQL at this point unless we had an overwhelmingly 
bullet-proof CQL->SQL translation layer that didn't introduce new edge cases 
nor performance degradation compared to CQL directly today. Users would have to 
have the ability for existing CQL applications to Just Work when migrated onto 
some new paradigm where the existing CQL native protocol endpoints were 
deprecated. At that point we'd just be weighing the cost of maintaining a 
translation layer between API semantics vs. a translation layer between the 
native protocol and the storage engine we already have today; lot of work to 
just be where we are today IMO.

We've learned the hard way that when you remove functionality from the database 
it hurts a lot of users in a lot of ways and we all discussed and broadly had a 
consensus to try not to remove anything going forward on the dev ML in the past 
year as I recall. Removing our core query language would be... quite the 
opposite of what we discussed and agreed to.

Now - SQL layer on top of the storage engine? If people want to work on that I 
think it'd be great for our ecosystem. To Chris' point, I think there's 
probably appetite from users' perspectives to have different APIs to interact 
with data in the storage engine, be it gRPC, GraphQL, JSON, CQL over REST, CQL, 
SQL, etc. Us having a layer that allowed us to reasonably build in that 
functionality would be a net win.

On Tue, Nov 4, 2025, at 12:36 PM, Chris Lohfink wrote:
> Just throwing my 2 cents in. I'm probably in the unpopular camp of wanting to 
> to move the other direction towards a grpc endpoint that is even more 
> restrictive than cql. This is coming from a standpoint of needing to clean up 
> after mistakes (application/modeling etc, not cassandra) than the standpoint 
> of trying to sell people on using the database. I would prefer to see all the 
> features and endpoints we provide work well without breaking than make cool 
> demos and feature bullet points. That said I know in order for a database to 
> be successful we need the cool feature sets as well.  CQL works for now and 
> deprecating that would be an absolute nightmare for people *already* using it 
> (ie thrift migration was not fun for anyone). I say create a new entrypoint 
> or layer, mark it experimental and allow operators to disable it but leave 
> the existing CQL interface alone.
> 
> Chris
> 
> On Tue, Nov 4, 2025 at 10:53 AM Isaac Reath <[email protected]> wrote:
>> I share Joey's opinions on this. Many features that resemble SQL (e.g., 
>> indexes, materialized views) come with caveats that stem from their 
>> implementation details rather than the query language itself. If we expose 
>> these same features through SQL as they are today, I think we'd risk setting 
>> users up for disappointment, since they will come in with implicit 
>> expectations about how a given SQL feature should work based on their 
>> previous experience and more often than not we won't meet that expectation. 
>> At least with CQL we set the expectation that this is a different database, 
>> where familiar concepts might behave differently than you would expect. 
>> 
>> That said, in terms of a long term direction, I think having SQL support is 
>> a good guiding light and implementing it as a stateless component as Jeff 
>> suggests would help make this easier to realize. 
>> 
>> On Tue, Nov 4, 2025 at 10:23 AM Joseph Lynch <[email protected]> wrote:
>>> Removing CQL is, in my opinion, completely off the table. When we 
>>> deprecated Thrift and gave CQL as the new query language, we imposed 
>>> significant pain on our existing functional Thrift applications to migrate 
>>> to it - I feel we should not hurt our users like that again.
>>> 
>>> I worry that we already struggle to implement the current surface area of 
>>> CQL correctly and in a way that scales safely. For example, CQL allows us 
>>> to create arbitrarily large partitions, but large partitions and large 
>>> columns continue to be something our storage engine can't currently handle 
>>> well. CQL allows us to create secondary indices for improved filter support 
>>> but few can (or at least we struggle) to safely use them in production. We 
>>> still struggle with how page timeouts, hedges and retries work in an 
>>> idempotent and reliable way in our current protocol - although CQL at least 
>>> gives us a path to implementing those.
>>> 
>>> I wonder if we should focus on being excellent at the basic write and read 
>>> operations we already support before adding more complexity at the API 
>>> layer. I am excited by the recent proposals around unbounded partitions, 
>>> byte ordered partitioner with safe data movement, ability to execute 
>>> analytics queries efficiently via a separate columnar representation etc 
>>> ... and *all* of those and more would likely be *required* to tackle SQL in 
>>> any meaningful way.
>>> 
>>> The surface area of SQL is much much wider, requiring functional 
>>> implementation of all of that plus joins, interactive transactions and 
>>> more. The SQL protocol itself is also quite poor for reliable communication 
>>> and rarely has performant async clients with size based pagination, per 
>>> page timeouts, per page hedging, incremental progress over a streaming 
>>> async interface, pagination resumption, etc ...  A lot of this difficulty 
>>> stems from the protocol often being tied to TCP connections and the 
>>> inherently unbounded complexity of the read interface.
>>> 
>>> I guess I'm saying, I think we should prioritize succeeding at the API 
>>> scope we already have before adding more. Deferring to standard SQL syntax 
>>> or naming when we can just seems like a good idea (why reinvent concepts), 
>>> but I don't think the friction with CQL is because it's not SQL, I think 
>>> it's because users can't tell what works and what doesn't work.
>>> 
>>> -Joey 
>>> 
>>> On Tue, Nov 4, 2025 at 8:42 AM Josh McKenzie <[email protected]> wrote:
>>>> __
>>>> +1 to Mick and Aleksey. I think the key for me was this:
>>>>> One is Cassandra’s wide-partition model with flexible clustering columns, 
>>>>> which supports very large, ordered partitions (e.g. time-series and 
>>>>> efficient range scans), rather than a strictly normalised, join-centric 
>>>>> model. These patterns don’t always map cleanly to SQL semantics, and 
>>>>> CQL’s query-driven, table-per-query modelling helps move users toward 
>>>>> designs that scale predictably.
>>>> 
>>>> We'd need really robust EXPLAIN / EXPLAIN ANALYZE support (see here 
>>>> <https://www.postgresql.org/docs/current/sql-explain.html>) for users to 
>>>> be able to make sense of how their SQL queries translate into underlying 
>>>> disk access patterns. Having a wide-open field of full SQL compliance they 
>>>> then need to understand how to constrain to get horizontal scale out of it 
>>>> would be *much more challenging* than the already somewhat "new" cognitive 
>>>> muscle our users have to build to realize that horizontal scaling of data 
>>>> access doesn't come free.
>>>> 
>>>> I think that would give us a future state of "Use SQL when you need / want 
>>>> a lot of expressivity, use CQL when you need to be constrained to language 
>>>> primitives that keep your data access scalable". The part that gets me 
>>>> wary here is how we've run into pain in the past trying to be both a 
>>>> database that allows more query expressivity (ALLOW FILTERING, legacy 2i 
>>>> come to mind) and a database that also wants horizontal scale.
>>>> 
>>>> I'd love us to be able to have our cake and eat it too but I don't know if 
>>>> that's possible. So at the very least I'd advocate for SQL + CQL going 
>>>> forward, or SQL + a constrained "CQL-like" mode that gives the same 
>>>> constraints CQL does today on modeling that guide people towards that very 
>>>> partitionable path.
>>>> 
>>>> On Tue, Nov 4, 2025, at 8:12 AM, Aleksey Yeshchenko wrote:
>>>>> I don’t mind us implementing some Postgres syntax support in some 
>>>>> capacity, but I do not like the idea of limiting what Cassandra is 
>>>>> allowed to do, or expose via CQL, to what is expressible by Postgres’s 
>>>>> SQL.
>>>>> 
>>>>> Many moons ago, before we started work on native protocol and CQL, I 
>>>>> could perhaps a bigger benefit to going Postgres route - for the client 
>>>>> protocol and the language. We could piggyback on existing client 
>>>>> infrastructure and SQL familiarity. But at this stage, when we have 
>>>>> already made the effort to develop decent drivers, and CQL is fleshed 
>>>>> out, and C* is quite mature overall, how much would we gain from this 
>>>>> transition?
>>>>> 
>>>>> I’m broadly with Mick here. And I support using Postgres’ SQL as 
>>>>> inspiration for implementing new CQL features wherever it makes sense - 
>>>>> it’s something we’ve been doing for a decade already. But I don’t believe 
>>>>> that deprecating CQL is the way to go at this point.
>>>>> 
>>>>> > On 4 Nov 2025, at 06:38, Mick <[email protected]> wrote:
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> >> On 3 Nov 2025, at 20:32, Joel Shepherd <[email protected]> wrote:
>>>>> >> 
>>>>> >> At the same time, my personal opinion is that if SQL compatibility is 
>>>>> >> pursued, then the end game should be to deprecate CQL. That will 
>>>>> >> probably take years, but at the limit I don't see a lot of benefit to 
>>>>> >> supporting both.
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> > We want SQL, but _why_ (in all its nuances) do we want SQL ?  A lot is 
>>>>> > obvious, but it is a very broad question.
>>>>> > 
>>>>> > The adoption and standardisation benefits are obvious, but CQL has 
>>>>> > strengths relative to SQL in Cassandra’s context.  
>>>>> > 
>>>>> > One is Cassandra’s wide-partition model with flexible clustering 
>>>>> > columns, which supports very large, ordered partitions (e.g. 
>>>>> > time-series and efficient range scans), rather than a strictly 
>>>>> > normalised, join-centric model. These patterns don’t always map cleanly 
>>>>> > to SQL semantics, and CQL’s query-driven, table-per-query modelling 
>>>>> > helps move users toward designs that scale predictably.
>>>>> > 
>>>>> > I can see CQL continuing as Cassandra’s high-throughput, query-driven 
>>>>> > DSL, while we pursue SQL compatibility.  I appreciate Dinesh’s ‘lanes’ 
>>>>> > framing, e.g. eventually default to a SQL interface (with Accord) for 
>>>>> > the broadest UX, while CQL remains a high-throughput path.
>>>>> > 
>>>>> > Should we also be discussing storage-engine implications ?  Cassandra’s 
>>>>> > LSMT/SSTable design optimises write paths; while a SQL presents a 
>>>>> > logical view without constraining physical layout; so data on disk 
>>>>> > stays optimised for dominant access patterns.  I can also see the need 
>>>>> > to discuss transport vs query languages differences.
>>>>> > 
>>>>> > Are we after both SQL's DML and DDL abilities ?  Beyond accessibility 
>>>>> > and exploration, SQL often comes with mature tooling for schema change 
>>>>> > management. Cassandra supports online schema changes (e.g., ALTER 
>>>>> > TABLE), but cross-table/primary-key changes remain constrained. A SQL 
>>>>> > interface alone won’t ‘solve’ this: it’s about migration tooling and 
>>>>> > engine capabilities; changing data models at-scale faces separate 
>>>>> > challenges.
>>>>> > 
>>>>> > Especially outside of early-stage apps and ad-hoc exploration I find 
>>>>> > SQL less interesting and its ergonomics less aligned with Cassandra’s 
>>>>> > runtime performance model.  That doesn't make me opposed to the 
>>>>> > endeavour of SQL compatibility, it pushes me on the why question a bit 
>>>>> > more for alignment clarity to our strengths.
>>>>> 
>>>>> 
>>>>

Re: [DISCUSS] SQL support in Cassandra

Reply via email to