Re: [DISCUSS] SQL support in Cassandra

Benjamin Lerer Tue, 04 Nov 2025 08:32:36 -0800

I would be curious to see a gap analysis between CQL and SQL that include
the differences in behaviors. I suspect that it will bring a few surprises
and provide some more solid foundation to this discussion.


Le mar. 4 nov. 2025 à 17:24, Štefan Miklošovič <[email protected]> a
écrit :

> I just want to ask this question ... feel free to shoot it down, just
> curious about the feedback / pros / cons.
>
> When we talk about "joins", yeah, it is not supported as we are used
> to in the SQL world. But joins _are_ possible, via Spark (Cassandra
> connector) / via Spark itself.
>
> When we have Cassandra Analytics now, why could not we integrate it
> with Cassandra (as something pluggable)? Basically, a user would
> execute
>
> USE shop;
>
> SELECT customers.name, orders.item FROM customers JOIN orders ON
> customers.id = orders.customer_id;
>
> Then we take this "CQL" query, construct logic for Spark behind that,
> put that to Analytics / Spark or whatever under the hood and present
> the result back to a caller?
>
> For now, we need to develop a custom Spark application, then to deploy
> it, then interpret the results and so on. I just do not see why we
> could not optionally integrate Spark into Cassandra in such a way,
> really something pluggable, which would enable this kind of queries. I
> just do not want to write any custom Spark app just to join two
> tables. Just delegate this kind of a query to Spark, wait for the
> result, and display it to me?
>
> On Tue, Nov 4, 2025 at 5:09 PM Aaron <[email protected]> wrote:
> >
> > Overall I like this idea. It will help us lower the learning curve for
> Cassandra, making it feel like a more viable option for folks who might not
> otherwise have considered it. Keeping CQL and SQL as parallel options is
> the approach that I would prefer, as well.
> >
> > Might not be a bad idea to classify SQL commands as OLTP vs. OLAP, and
> have v1 be just OLTP, with commands that are more often used in an OLAP
> paradigm to follow in v2. Doesn't have to be that, but it might be worth
> our time to see if there are logical ways that we can break-up the workload
> of a SQL implementation into more manageable pieces.
> >
> >>  I don't think the friction with CQL is because it's not SQL, I think
> it's because users can't tell what works and what doesn't work.
> >
> >
> > I don't think this is the main motivation here. The motivation for doing
> this is (should be) meeting a standard embraced by most other databases
> because it will ultimately help our users. We should want a developer (who
> has never touched Cassandra before) to be able to sit down and be
> productive with their existing skillset.
> >
> > We should also want to take some of the pain out of moving an existing
> application. It may not end up being as simple as re-pointing an
> application from Postgres to Cassandra, but reducing the friction involved
> should be a consideration.
> >
> > Thanks,
> >
> > Aaron
> >
> >
> > On Tue, Nov 4, 2025 at 9:23 AM Joseph Lynch <[email protected]>
> wrote:
> >>
> >> Removing CQL is, in my opinion, completely off the table. When we
> deprecated Thrift and gave CQL as the new query language, we imposed
> significant pain on our existing functional Thrift applications to migrate
> to it - I feel we should not hurt our users like that again.
> >>
> >> I worry that we already struggle to implement the current surface area
> of CQL correctly and in a way that scales safely. For example, CQL allows
> us to create arbitrarily large partitions, but large partitions and large
> columns continue to be something our storage engine can't currently handle
> well. CQL allows us to create secondary indices for improved filter support
> but few can (or at least we struggle) to safely use them in production. We
> still struggle with how page timeouts, hedges and retries work in an
> idempotent and reliable way in our current protocol - although CQL at least
> gives us a path to implementing those.
> >>
> >> I wonder if we should focus on being excellent at the basic write and
> read operations we already support before adding more complexity at the API
> layer. I am excited by the recent proposals around unbounded partitions,
> byte ordered partitioner with safe data movement, ability to execute
> analytics queries efficiently via a separate columnar representation etc
> ... and all of those and more would likely be required to tackle SQL in any
> meaningful way.
> >>
> >> The surface area of SQL is much much wider, requiring functional
> implementation of all of that plus joins, interactive transactions and
> more. The SQL protocol itself is also quite poor for reliable communication
> and rarely has performant async clients with size based pagination, per
> page timeouts, per page hedging, incremental progress over a streaming
> async interface, pagination resumption, etc ...  A lot of this difficulty
> stems from the protocol often being tied to TCP connections and the
> inherently unbounded complexity of the read interface.
> >>
> >> I guess I'm saying, I think we should prioritize succeeding at the API
> scope we already have before adding more. Deferring to standard SQL syntax
> or naming when we can just seems like a good idea (why reinvent concepts),
> but I don't think the friction with CQL is because it's not SQL, I think
> it's because users can't tell what works and what doesn't work.
> >>
> >> -Joey
> >>
> >> On Tue, Nov 4, 2025 at 8:42 AM Josh McKenzie <[email protected]>
> wrote:
> >>>
> >>> +1 to Mick and Aleksey. I think the key for me was this:
> >>>
> >>> One is Cassandra’s wide-partition model with flexible clustering
> columns, which supports very large, ordered partitions (e.g. time-series
> and efficient range scans), rather than a strictly normalised, join-centric
> model. These patterns don’t always map cleanly to SQL semantics, and CQL’s
> query-driven, table-per-query modelling helps move users toward designs
> that scale predictably.
> >>>
> >>>
> >>> We'd need really robust EXPLAIN / EXPLAIN ANALYZE support (see here)
> for users to be able to make sense of how their SQL queries translate into
> underlying disk access patterns. Having a wide-open field of full SQL
> compliance they then need to understand how to constrain to get horizontal
> scale out of it would be much more challenging than the already somewhat
> "new" cognitive muscle our users have to build to realize that horizontal
> scaling of data access doesn't come free.
> >>>
> >>> I think that would give us a future state of "Use SQL when you need /
> want a lot of expressivity, use CQL when you need to be constrained to
> language primitives that keep your data access scalable". The part that
> gets me wary here is how we've run into pain in the past trying to be both
> a database that allows more query expressivity (ALLOW FILTERING, legacy 2i
> come to mind) and a database that also wants horizontal scale.
> >>>
> >>> I'd love us to be able to have our cake and eat it too but I don't
> know if that's possible. So at the very least I'd advocate for SQL + CQL
> going forward, or SQL + a constrained "CQL-like" mode that gives the same
> constraints CQL does today on modeling that guide people towards that very
> partitionable path.
> >>>
> >>> On Tue, Nov 4, 2025, at 8:12 AM, Aleksey Yeshchenko wrote:
> >>>
> >>> I don’t mind us implementing some Postgres syntax support in some
> capacity, but I do not like the idea of limiting what Cassandra is allowed
> to do, or expose via CQL, to what is expressible by Postgres’s SQL.
> >>>
> >>> Many moons ago, before we started work on native protocol and CQL, I
> could perhaps a bigger benefit to going Postgres route - for the client
> protocol and the language. We could piggyback on existing client
> infrastructure and SQL familiarity. But at this stage, when we have already
> made the effort to develop decent drivers, and CQL is fleshed out, and C*
> is quite mature overall, how much would we gain from this transition?
> >>>
> >>> I’m broadly with Mick here. And I support using Postgres’ SQL as
> inspiration for implementing new CQL features wherever it makes sense -
> it’s something we’ve been doing for a decade already. But I don’t believe
> that deprecating CQL is the way to go at this point.
> >>>
> >>> > On 4 Nov 2025, at 06:38, Mick <[email protected]> wrote:
> >>> >
> >>> >
> >>> >
> >>> >> On 3 Nov 2025, at 20:32, Joel Shepherd <[email protected]> wrote:
> >>> >>
> >>> >> At the same time, my personal opinion is that if SQL compatibility
> is pursued, then the end game should be to deprecate CQL. That will
> probably take years, but at the limit I don't see a lot of benefit to
> supporting both.
> >>> >
> >>> >
> >>> >
> >>> > We want SQL, but _why_ (in all its nuances) do we want SQL ?  A lot
> is obvious, but it is a very broad question.
> >>> >
> >>> > The adoption and standardisation benefits are obvious, but CQL has
> strengths relative to SQL in Cassandra’s context.
> >>> >
> >>> > One is Cassandra’s wide-partition model with flexible clustering
> columns, which supports very large, ordered partitions (e.g. time-series
> and efficient range scans), rather than a strictly normalised, join-centric
> model. These patterns don’t always map cleanly to SQL semantics, and CQL’s
> query-driven, table-per-query modelling helps move users toward designs
> that scale predictably.
> >>> >
> >>> > I can see CQL continuing as Cassandra’s high-throughput,
> query-driven DSL, while we pursue SQL compatibility.  I appreciate Dinesh’s
> ‘lanes’ framing, e.g. eventually default to a SQL interface (with Accord)
> for the broadest UX, while CQL remains a high-throughput path.
> >>> >
> >>> > Should we also be discussing storage-engine implications ?
> Cassandra’s LSMT/SSTable design optimises write paths; while a SQL presents
> a logical view without constraining physical layout; so data on disk stays
> optimised for dominant access patterns.  I can also see the need to discuss
> transport vs query languages differences.
> >>> >
> >>> > Are we after both SQL's DML and DDL abilities ?  Beyond
> accessibility and exploration, SQL often comes with mature tooling for
> schema change management. Cassandra supports online schema changes (e.g.,
> ALTER TABLE), but cross-table/primary-key changes remain constrained. A SQL
> interface alone won’t ‘solve’ this: it’s about migration tooling and engine
> capabilities; changing data models at-scale faces separate challenges.
> >>> >
> >>> > Especially outside of early-stage apps and ad-hoc exploration I find
> SQL less interesting and its ergonomics less aligned with Cassandra’s
> runtime performance model.  That doesn't make me opposed to the endeavour
> of SQL compatibility, it pushes me on the why question a bit more for
> alignment clarity to our strengths.
> >>>
> >>>
> >>>
>

Re: [DISCUSS] SQL support in Cassandra

Reply via email to