Re: [DISCUSS] SQL support in Cassandra

Dinesh Joshi Wed, 05 Nov 2025 10:44:46 -0800

There are two distinct conversations in this thread.

1. What does the evolution of CQL Syntax look like?
2. What is the path to bring SQL to Cassandra?


I suggest we fork 2 discuss threads to have a focused discussion on each
topic.

Thanks,

Dinesh

On Wed, Nov 5, 2025 at 10:29 AM David Capwell <[email protected]> wrote:

> My personal stance is that new work should look at existing syntax and ask
> the question “why are we different”, if the answer is “I prefer this” or “I
> didn’t have the time”, I want to push back against this and argue for SQL /
> Postgres w/e possible.  If the answer is “correctness” or “performance” I
> am far more open to do things our own way.
>
> Given the above, I don’t like having a requirement we must be SQL /
> Postgres compliant, but I do think its a good guide post to keep in mind
> when we are doing something new.
>
> I worry that we already struggle to implement the current surface area of
> CQL correctly and in a way that scales safely.
>
>
> This has been a big issue for me over the past few years, when we
> implement features correctness / semantics have not historically been given
> the thought I feel that they deserve; we have so many weird behaviors that
> leak into user land (batch / CAS failures come to mind as they are
> constantly making me sad… why is the “short” type variable length? WHY DO
> WE HAVE MEANINGLESS EMPTYNESS!!!!); we have gotten much better over the
> years though… not all negative here =)
>
> SQL has been building its surface area for decades and trying to catch up
> is a significant effort and how to make things correct and performant
> becomes an issue.  In the latest spec there is now support for graph
> queries, so signing up to be compatible means we need to implement the below
>
> SELECT *
> FROM GRAPH_TABLE(my_graph
>     MATCH (a IS person)-[e IS friends]->(b IS person WHERE b.name =
> 'Alice')
>     WHERE a.name = 'Mary'
>     COLUMNS (a.name AS person_a, b.name AS person_b)
> );
>
> That above example is is just a simple example, it gets far more complex
> and would be harder for C* to support.
>
>
> I would be curious to see a gap analysis between CQL and SQL that include
> the differences in behaviors. I suspect that it will bring a few surprises
> and provide some more solid foundation to this discussion.
>
>
> I think this is a good starting point.  There are some nice things in SQL
> missing in C* that could be implemented without a ton of risk, and opening
> up the discussion around these areas makes sense to me.
>
> Off the top of my head, here are basic queries that work in SQL but not
> CQL, and there is very low levels of risk to support.
>
> SELECT 1 — simple query to test if the connection is still live
>
> SELECT func(42) FROM system.peers; — this has lead someone I know to have
> to implement functions that return constants specifically to work around
> this limitation…
>
>
>
> On Nov 5, 2025, at 9:15 AM, Jeff Jirsa <[email protected]> wrote:
>
> CQL just to demonstrate it’s possible
>
> Fat node style would indeed be faster but im mostly proving that its
> functional
>
> On Nov 5, 2025, at 8:55 AM, Joseph Lynch <[email protected]> wrote:
>
> 
> I very much like Jeff, Josh et al.'s proposals around the pluggable
> stateless API layer. Also I agree with Chris I would prefer a simpler API
> not a more complex one for our applications to couple to e.g. the Java
> stdlib. This also sets up a really nice path where the community members
> can build the layers that make sense first out-of-tree, and as a project we
> can choose the successful ones to bring in-tree. Whichever API those layers
> couple to would be a new semi-public interface though which has to be
> weighed.
>
> Jeff I am curious, in that prototype you are hacking are you interacting
> directly with the internode protocol and verb system or going through CQL?
> I imagine there could be some strengths to going straight to the internode?
>
> -Joey
>
> On Tue, Nov 4, 2025 at 3:49 PM Josh McKenzie <[email protected]> wrote:
>
>> Again from
>>
>> Right. I'm just zooming out a bit more and applying that same logical
>> pattern broadly to other API language domains, not just SQL. But yes - your
>> point definitely stands.
>>
>> On Tue, Nov 4, 2025, at 6:42 PM, Patrick McFadin wrote:
>>
>> I’m grooving on what “Cloud Native Jeff” is saying here and I would like
>> to see where this could go. If we use a well established library like
>> Calcite, then there is no API to maintain. We might find parts of Cassandra
>> along the way we could alter to make it easier to integrate, but so far
>> that’s just a premature optimization.
>>
>> Suuuuper interested to see the TPC-C when you have it, Jeff.
>>
>> > On Nov 4, 2025, at 3:25 PM, Jeff Jirsa <[email protected]> wrote:
>> >
>> >
>> >
>> > On 2025/11/04 22:32:08 Josh McKenzie wrote:
>> >>
>> >> So I guess what I'm noodling on here is a superset of what Patrick is
>> w/a slight modification, where we double down on CQL as being the "low
>> level high performance" API for C*, and have SQL and other APIs built on
>> top of that.
>> >>
>> >
>> > Again from
>> https://lists.apache.org/thread/hdwf0g7pnnko7m84yxn87lybnlcdvn50
>> >
>> >> Or is it building a native SQL implementation stateless on top of a
>> backing ordered (ByteOrderedPartitioner), transactional (accord), key-value
>> cassandra cluster ? It’s an extra hop, but trying to adjust the existing
>> grammar / DDL to fit into a language it always mimicked but never
>> implemented faithfully feels like a bumpy road, where there are many
>> successful existence proofs for building it stateless a layer above.
>> >
>> > TiKV / TiDB, FoundationDB, etc, etc, etc.
>> >
>> > If you have a transactional, performant, ordered KV store, you can
>> built almost any high level database on top of it. You can expose even
>> lower layer primitives (like placement) to optimize for it.
>>
>>
>>
>>
>

Re: [DISCUSS] SQL support in Cassandra

Reply via email to