Re: [DISCUSS] SQL support in Cassandra

Patrick McFadin Wed, 05 Nov 2025 10:48:32 -0800

I agree on splitting this up. I'll do that today.

Patrick


On Wed, Nov 5, 2025 at 10:44 AM Dinesh Joshi <[email protected]> wrote:

> There are two distinct conversations in this thread.
>
> 1. What does the evolution of CQL Syntax look like?
> 2. What is the path to bring SQL to Cassandra?
>
> I suggest we fork 2 discuss threads to have a focused discussion on each
> topic.
>
> Thanks,
>
> Dinesh
>
> On Wed, Nov 5, 2025 at 10:29 AM David Capwell <[email protected]> wrote:
>
>> My personal stance is that new work should look at existing syntax and
>> ask the question “why are we different”, if the answer is “I prefer this”
>> or “I didn’t have the time”, I want to push back against this and argue for
>> SQL / Postgres w/e possible.  If the answer is “correctness” or
>> “performance” I am far more open to do things our own way.
>>
>> Given the above, I don’t like having a requirement we must be SQL /
>> Postgres compliant, but I do think its a good guide post to keep in mind
>> when we are doing something new.
>>
>> I worry that we already struggle to implement the current surface area of
>> CQL correctly and in a way that scales safely.
>>
>>
>> This has been a big issue for me over the past few years, when we
>> implement features correctness / semantics have not historically been given
>> the thought I feel that they deserve; we have so many weird behaviors that
>> leak into user land (batch / CAS failures come to mind as they are
>> constantly making me sad… why is the “short” type variable length? WHY DO
>> WE HAVE MEANINGLESS EMPTYNESS!!!!); we have gotten much better over the
>> years though… not all negative here =)
>>
>> SQL has been building its surface area for decades and trying to catch up
>> is a significant effort and how to make things correct and performant
>> becomes an issue.  In the latest spec there is now support for graph
>> queries, so signing up to be compatible means we need to implement the below
>>
>> SELECT *
>> FROM GRAPH_TABLE(my_graph
>>     MATCH (a IS person)-[e IS friends]->(b IS person WHERE b.name =
>> 'Alice')
>>     WHERE a.name = 'Mary'
>>     COLUMNS (a.name AS person_a, b.name AS person_b)
>> );
>>
>> That above example is is just a simple example, it gets far more complex
>> and would be harder for C* to support.
>>
>>
>> I would be curious to see a gap analysis between CQL and SQL that include
>> the differences in behaviors. I suspect that it will bring a few surprises
>> and provide some more solid foundation to this discussion.
>>
>>
>> I think this is a good starting point.  There are some nice things in SQL
>> missing in C* that could be implemented without a ton of risk, and opening
>> up the discussion around these areas makes sense to me.
>>
>> Off the top of my head, here are basic queries that work in SQL but not
>> CQL, and there is very low levels of risk to support.
>>
>> SELECT 1 — simple query to test if the connection is still live
>>
>> SELECT func(42) FROM system.peers; — this has lead someone I know to have
>> to implement functions that return constants specifically to work around
>> this limitation…
>>
>>
>>
>> On Nov 5, 2025, at 9:15 AM, Jeff Jirsa <[email protected]> wrote:
>>
>> CQL just to demonstrate it’s possible
>>
>> Fat node style would indeed be faster but im mostly proving that its
>> functional
>>
>> On Nov 5, 2025, at 8:55 AM, Joseph Lynch <[email protected]> wrote:
>>
>> 
>> I very much like Jeff, Josh et al.'s proposals around the pluggable
>> stateless API layer. Also I agree with Chris I would prefer a simpler API
>> not a more complex one for our applications to couple to e.g. the Java
>> stdlib. This also sets up a really nice path where the community members
>> can build the layers that make sense first out-of-tree, and as a project we
>> can choose the successful ones to bring in-tree. Whichever API those layers
>> couple to would be a new semi-public interface though which has to be
>> weighed.
>>
>> Jeff I am curious, in that prototype you are hacking are you interacting
>> directly with the internode protocol and verb system or going through CQL?
>> I imagine there could be some strengths to going straight to the internode?
>>
>> -Joey
>>
>> On Tue, Nov 4, 2025 at 3:49 PM Josh McKenzie <[email protected]>
>> wrote:
>>
>>> Again from
>>>
>>> Right. I'm just zooming out a bit more and applying that same logical
>>> pattern broadly to other API language domains, not just SQL. But yes - your
>>> point definitely stands.
>>>
>>> On Tue, Nov 4, 2025, at 6:42 PM, Patrick McFadin wrote:
>>>
>>> I’m grooving on what “Cloud Native Jeff” is saying here and I would like
>>> to see where this could go. If we use a well established library like
>>> Calcite, then there is no API to maintain. We might find parts of Cassandra
>>> along the way we could alter to make it easier to integrate, but so far
>>> that’s just a premature optimization.
>>>
>>> Suuuuper interested to see the TPC-C when you have it, Jeff.
>>>
>>> > On Nov 4, 2025, at 3:25 PM, Jeff Jirsa <[email protected]> wrote:
>>> >
>>> >
>>> >
>>> > On 2025/11/04 22:32:08 Josh McKenzie wrote:
>>> >>
>>> >> So I guess what I'm noodling on here is a superset of what Patrick is
>>> w/a slight modification, where we double down on CQL as being the "low
>>> level high performance" API for C*, and have SQL and other APIs built on
>>> top of that.
>>> >>
>>> >
>>> > Again from
>>> https://lists.apache.org/thread/hdwf0g7pnnko7m84yxn87lybnlcdvn50
>>> >
>>> >> Or is it building a native SQL implementation stateless on top of a
>>> backing ordered (ByteOrderedPartitioner), transactional (accord), key-value
>>> cassandra cluster ? It’s an extra hop, but trying to adjust the existing
>>> grammar / DDL to fit into a language it always mimicked but never
>>> implemented faithfully feels like a bumpy road, where there are many
>>> successful existence proofs for building it stateless a layer above.
>>> >
>>> > TiKV / TiDB, FoundationDB, etc, etc, etc.
>>> >
>>> > If you have a transactional, performant, ordered KV store, you can
>>> built almost any high level database on top of it. You can expose even
>>> lower layer primitives (like placement) to optimize for it.
>>>
>>>
>>>
>>>
>>

Re: [DISCUSS] SQL support in Cassandra

Reply via email to