My personal stance is that new work should look at existing syntax and ask the
question “why are we different”, if the answer is “I prefer this” or “I didn’t
have the time”, I want to push back against this and argue for SQL / Postgres
w/e possible. If the answer is “correctness” or “performance” I am far more
open to do things our own way.
Given the above, I don’t like having a requirement we must be SQL / Postgres
compliant, but I do think its a good guide post to keep in mind when we are
doing something new.
> I worry that we already struggle to implement the current surface area of CQL
> correctly and in a way that scales safely.
This has been a big issue for me over the past few years, when we implement
features correctness / semantics have not historically been given the thought I
feel that they deserve; we have so many weird behaviors that leak into user
land (batch / CAS failures come to mind as they are constantly making me sad…
why is the “short” type variable length? WHY DO WE HAVE MEANINGLESS
EMPTYNESS!!!!); we have gotten much better over the years though… not all
negative here =)
SQL has been building its surface area for decades and trying to catch up is a
significant effort and how to make things correct and performant becomes an
issue. In the latest spec there is now support for graph queries, so signing
up to be compatible means we need to implement the below
SELECT *
FROM GRAPH_TABLE(my_graph
MATCH (a IS person)-[e IS friends]->(b IS person WHERE b.name = 'Alice')
WHERE a.name = 'Mary'
COLUMNS (a.name AS person_a, b.name AS person_b)
);
That above example is is just a simple example, it gets far more complex and
would be harder for C* to support.
> I would be curious to see a gap analysis between CQL and SQL that include the
> differences in behaviors. I suspect that it will bring a few surprises and
> provide some more solid foundation to this discussion.
I think this is a good starting point. There are some nice things in SQL
missing in C* that could be implemented without a ton of risk, and opening up
the discussion around these areas makes sense to me.
Off the top of my head, here are basic queries that work in SQL but not CQL,
and there is very low levels of risk to support.
SELECT 1 — simple query to test if the connection is still live
SELECT func(42) FROM system.peers; — this has lead someone I know to have to
implement functions that return constants specifically to work around this
limitation…
> On Nov 5, 2025, at 9:15 AM, Jeff Jirsa <[email protected]> wrote:
>
> CQL just to demonstrate it’s possible
>
> Fat node style would indeed be faster but im mostly proving that its
> functional
>
>> On Nov 5, 2025, at 8:55 AM, Joseph Lynch <[email protected]> wrote:
>>
>>
>> I very much like Jeff, Josh et al.'s proposals around the pluggable
>> stateless API layer. Also I agree with Chris I would prefer a simpler API
>> not a more complex one for our applications to couple to e.g. the Java
>> stdlib. This also sets up a really nice path where the community members can
>> build the layers that make sense first out-of-tree, and as a project we can
>> choose the successful ones to bring in-tree. Whichever API those layers
>> couple to would be a new semi-public interface though which has to be
>> weighed.
>>
>> Jeff I am curious, in that prototype you are hacking are you interacting
>> directly with the internode protocol and verb system or going through CQL? I
>> imagine there could be some strengths to going straight to the internode?
>>
>> -Joey
>>
>> On Tue, Nov 4, 2025 at 3:49 PM Josh McKenzie <[email protected]
>> <mailto:[email protected]>> wrote:
>>>> Again from
>>> Right. I'm just zooming out a bit more and applying that same logical
>>> pattern broadly to other API language domains, not just SQL. But yes - your
>>> point definitely stands.
>>>
>>> On Tue, Nov 4, 2025, at 6:42 PM, Patrick McFadin wrote:
>>>> I’m grooving on what “Cloud Native Jeff” is saying here and I would like
>>>> to see where this could go. If we use a well established library like
>>>> Calcite, then there is no API to maintain. We might find parts of
>>>> Cassandra along the way we could alter to make it easier to integrate, but
>>>> so far that’s just a premature optimization.
>>>>
>>>> Suuuuper interested to see the TPC-C when you have it, Jeff.
>>>>
>>>> > On Nov 4, 2025, at 3:25 PM, Jeff Jirsa <[email protected]
>>>> > <mailto:[email protected]>> wrote:
>>>> >
>>>> >
>>>> >
>>>> > On 2025/11/04 22:32:08 Josh McKenzie wrote:
>>>> >>
>>>> >> So I guess what I'm noodling on here is a superset of what Patrick is
>>>> >> w/a slight modification, where we double down on CQL as being the "low
>>>> >> level high performance" API for C*, and have SQL and other APIs built
>>>> >> on top of that.
>>>> >>
>>>> >
>>>> > Again from
>>>> > https://lists.apache.org/thread/hdwf0g7pnnko7m84yxn87lybnlcdvn50
>>>> >
>>>> >> Or is it building a native SQL implementation stateless on top of a
>>>> >> backing ordered (ByteOrderedPartitioner), transactional (accord),
>>>> >> key-value cassandra cluster ? It’s an extra hop, but trying to adjust
>>>> >> the existing grammar / DDL to fit into a language it always mimicked
>>>> >> but never implemented faithfully feels like a bumpy road, where there
>>>> >> are many successful existence proofs for building it stateless a layer
>>>> >> above.
>>>> >
>>>> > TiKV / TiDB, FoundationDB, etc, etc, etc.
>>>> >
>>>> > If you have a transactional, performant, ordered KV store, you can built
>>>> > almost any high level database on top of it. You can expose even lower
>>>> > layer primitives (like placement) to optimize for it.
>>>>
>>>>
>>>