I just want to ask this question ... feel free to shoot it down, just curious about the feedback / pros / cons.
When we talk about "joins", yeah, it is not supported as we are used to in the SQL world. But joins _are_ possible, via Spark (Cassandra connector) / via Spark itself. When we have Cassandra Analytics now, why could not we integrate it with Cassandra (as something pluggable)? Basically, a user would execute USE shop; SELECT customers.name, orders.item FROM customers JOIN orders ON customers.id = orders.customer_id; Then we take this "CQL" query, construct logic for Spark behind that, put that to Analytics / Spark or whatever under the hood and present the result back to a caller? For now, we need to develop a custom Spark application, then to deploy it, then interpret the results and so on. I just do not see why we could not optionally integrate Spark into Cassandra in such a way, really something pluggable, which would enable this kind of queries. I just do not want to write any custom Spark app just to join two tables. Just delegate this kind of a query to Spark, wait for the result, and display it to me? On Tue, Nov 4, 2025 at 5:09 PM Aaron <[email protected]> wrote: > > Overall I like this idea. It will help us lower the learning curve for > Cassandra, making it feel like a more viable option for folks who might not > otherwise have considered it. Keeping CQL and SQL as parallel options is the > approach that I would prefer, as well. > > Might not be a bad idea to classify SQL commands as OLTP vs. OLAP, and have > v1 be just OLTP, with commands that are more often used in an OLAP paradigm > to follow in v2. Doesn't have to be that, but it might be worth our time to > see if there are logical ways that we can break-up the workload of a SQL > implementation into more manageable pieces. > >> I don't think the friction with CQL is because it's not SQL, I think it's >> because users can't tell what works and what doesn't work. > > > I don't think this is the main motivation here. The motivation for doing this > is (should be) meeting a standard embraced by most other databases because it > will ultimately help our users. We should want a developer (who has never > touched Cassandra before) to be able to sit down and be productive with their > existing skillset. > > We should also want to take some of the pain out of moving an existing > application. It may not end up being as simple as re-pointing an application > from Postgres to Cassandra, but reducing the friction involved should be a > consideration. > > Thanks, > > Aaron > > > On Tue, Nov 4, 2025 at 9:23 AM Joseph Lynch <[email protected]> wrote: >> >> Removing CQL is, in my opinion, completely off the table. When we deprecated >> Thrift and gave CQL as the new query language, we imposed significant pain >> on our existing functional Thrift applications to migrate to it - I feel we >> should not hurt our users like that again. >> >> I worry that we already struggle to implement the current surface area of >> CQL correctly and in a way that scales safely. For example, CQL allows us to >> create arbitrarily large partitions, but large partitions and large columns >> continue to be something our storage engine can't currently handle well. CQL >> allows us to create secondary indices for improved filter support but few >> can (or at least we struggle) to safely use them in production. We still >> struggle with how page timeouts, hedges and retries work in an idempotent >> and reliable way in our current protocol - although CQL at least gives us a >> path to implementing those. >> >> I wonder if we should focus on being excellent at the basic write and read >> operations we already support before adding more complexity at the API >> layer. I am excited by the recent proposals around unbounded partitions, >> byte ordered partitioner with safe data movement, ability to execute >> analytics queries efficiently via a separate columnar representation etc ... >> and all of those and more would likely be required to tackle SQL in any >> meaningful way. >> >> The surface area of SQL is much much wider, requiring functional >> implementation of all of that plus joins, interactive transactions and more. >> The SQL protocol itself is also quite poor for reliable communication and >> rarely has performant async clients with size based pagination, per page >> timeouts, per page hedging, incremental progress over a streaming async >> interface, pagination resumption, etc ... A lot of this difficulty stems >> from the protocol often being tied to TCP connections and the inherently >> unbounded complexity of the read interface. >> >> I guess I'm saying, I think we should prioritize succeeding at the API scope >> we already have before adding more. Deferring to standard SQL syntax or >> naming when we can just seems like a good idea (why reinvent concepts), but >> I don't think the friction with CQL is because it's not SQL, I think it's >> because users can't tell what works and what doesn't work. >> >> -Joey >> >> On Tue, Nov 4, 2025 at 8:42 AM Josh McKenzie <[email protected]> wrote: >>> >>> +1 to Mick and Aleksey. I think the key for me was this: >>> >>> One is Cassandra’s wide-partition model with flexible clustering columns, >>> which supports very large, ordered partitions (e.g. time-series and >>> efficient range scans), rather than a strictly normalised, join-centric >>> model. These patterns don’t always map cleanly to SQL semantics, and CQL’s >>> query-driven, table-per-query modelling helps move users toward designs >>> that scale predictably. >>> >>> >>> We'd need really robust EXPLAIN / EXPLAIN ANALYZE support (see here) for >>> users to be able to make sense of how their SQL queries translate into >>> underlying disk access patterns. Having a wide-open field of full SQL >>> compliance they then need to understand how to constrain to get horizontal >>> scale out of it would be much more challenging than the already somewhat >>> "new" cognitive muscle our users have to build to realize that horizontal >>> scaling of data access doesn't come free. >>> >>> I think that would give us a future state of "Use SQL when you need / want >>> a lot of expressivity, use CQL when you need to be constrained to language >>> primitives that keep your data access scalable". The part that gets me wary >>> here is how we've run into pain in the past trying to be both a database >>> that allows more query expressivity (ALLOW FILTERING, legacy 2i come to >>> mind) and a database that also wants horizontal scale. >>> >>> I'd love us to be able to have our cake and eat it too but I don't know if >>> that's possible. So at the very least I'd advocate for SQL + CQL going >>> forward, or SQL + a constrained "CQL-like" mode that gives the same >>> constraints CQL does today on modeling that guide people towards that very >>> partitionable path. >>> >>> On Tue, Nov 4, 2025, at 8:12 AM, Aleksey Yeshchenko wrote: >>> >>> I don’t mind us implementing some Postgres syntax support in some capacity, >>> but I do not like the idea of limiting what Cassandra is allowed to do, or >>> expose via CQL, to what is expressible by Postgres’s SQL. >>> >>> Many moons ago, before we started work on native protocol and CQL, I could >>> perhaps a bigger benefit to going Postgres route - for the client protocol >>> and the language. We could piggyback on existing client infrastructure and >>> SQL familiarity. But at this stage, when we have already made the effort to >>> develop decent drivers, and CQL is fleshed out, and C* is quite mature >>> overall, how much would we gain from this transition? >>> >>> I’m broadly with Mick here. And I support using Postgres’ SQL as >>> inspiration for implementing new CQL features wherever it makes sense - >>> it’s something we’ve been doing for a decade already. But I don’t believe >>> that deprecating CQL is the way to go at this point. >>> >>> > On 4 Nov 2025, at 06:38, Mick <[email protected]> wrote: >>> > >>> > >>> > >>> >> On 3 Nov 2025, at 20:32, Joel Shepherd <[email protected]> wrote: >>> >> >>> >> At the same time, my personal opinion is that if SQL compatibility is >>> >> pursued, then the end game should be to deprecate CQL. That will >>> >> probably take years, but at the limit I don't see a lot of benefit to >>> >> supporting both. >>> > >>> > >>> > >>> > We want SQL, but _why_ (in all its nuances) do we want SQL ? A lot is >>> > obvious, but it is a very broad question. >>> > >>> > The adoption and standardisation benefits are obvious, but CQL has >>> > strengths relative to SQL in Cassandra’s context. >>> > >>> > One is Cassandra’s wide-partition model with flexible clustering columns, >>> > which supports very large, ordered partitions (e.g. time-series and >>> > efficient range scans), rather than a strictly normalised, join-centric >>> > model. These patterns don’t always map cleanly to SQL semantics, and >>> > CQL’s query-driven, table-per-query modelling helps move users toward >>> > designs that scale predictably. >>> > >>> > I can see CQL continuing as Cassandra’s high-throughput, query-driven >>> > DSL, while we pursue SQL compatibility. I appreciate Dinesh’s ‘lanes’ >>> > framing, e.g. eventually default to a SQL interface (with Accord) for the >>> > broadest UX, while CQL remains a high-throughput path. >>> > >>> > Should we also be discussing storage-engine implications ? Cassandra’s >>> > LSMT/SSTable design optimises write paths; while a SQL presents a logical >>> > view without constraining physical layout; so data on disk stays >>> > optimised for dominant access patterns. I can also see the need to >>> > discuss transport vs query languages differences. >>> > >>> > Are we after both SQL's DML and DDL abilities ? Beyond accessibility and >>> > exploration, SQL often comes with mature tooling for schema change >>> > management. Cassandra supports online schema changes (e.g., ALTER TABLE), >>> > but cross-table/primary-key changes remain constrained. A SQL interface >>> > alone won’t ‘solve’ this: it’s about migration tooling and engine >>> > capabilities; changing data models at-scale faces separate challenges. >>> > >>> > Especially outside of early-stage apps and ad-hoc exploration I find SQL >>> > less interesting and its ergonomics less aligned with Cassandra’s runtime >>> > performance model. That doesn't make me opposed to the endeavour of SQL >>> > compatibility, it pushes me on the why question a bit more for alignment >>> > clarity to our strengths. >>> >>> >>>
