I like the idea of accepting more types of queries with fewer restrictions. I think we've been moving in the right direction, with SAI opening up more types of query possibilities.
I think the long term path towards more flexibility requires paying off some technical debt. We have a ton of places where we over-allocate or perform I/O sub-optimally. I think in order to support more types of queries we need to address this. At the moment it's difficult for me to envision a path towards complex queries with multiple predicates over massive datasets without highly efficient distributed indexes and a serious reduction in allocation amplification. Here are the high level requirements as I see them in order to get there. 1. Optimized read path with minimal garbage. This is a problem today, and would become a bigger one as we read bigger datasets. Relying on GC to handle this for us isn't a great solution here, it takes up a massive amount of CPU time and I've found it's fairly easy to lock up an entire cluster using ZGC. We need to reduce allocations on reads. I don't see any way around this. Fortunately there's some nice stuff coming in future JDKs that allow for better memory management that we can take advantage of. Stefan just pointed this out to me a couple days ago: https://openjdk.org/jeps/454, I think arenas would work nicely for allocations during reads as well as memtables. 2. Optimized I/O for bulk reads. If we're going to start doing more random I/O then we need to do better about treating it as a limited resource, especially since so many teams are moving to IOPS limited volumes like EBS. I've already filed CASSANDRA-15452 to address this from a compaction standpoint, since it's massively wasteful right now. If we're going to support inequality searches we're going need more efficient table scans as well. We currently read chunk by chunk which is generally only ~8KB or so, assuming 50% compression w/ 16KB chunk length. CASSANDRA-15452 should make it fairly easy to address the I/O waste during compaction, and I've just filed CASSANDRA-19494 which would help with table scans. 3. Materialized views that have some guarantees of consistency that work well with larger partitions. The current state of them makes them unfit for production since they can't be repaired and it's too easy to create MVs that ruin a cluster. I think we do need them for certain types of global queries, but not all. Anything that has very high cardinality with large clusters would work better with MVs than round-robin of the entire cluster. 4. Secondary indexes that perform well for global searches, that scale efficiently with SSTable counts. I believe we still need node-centric 2i that isn't SAI. I'm testing SAI with global queries tomorrow but I can't see how queries without partition restrictions that scale O(N) with sstable counts will be performant. We need a good indexing solution that is node-centric. 5. Pagination. Reevaluating entire queries in order to paginate is OK now since there's not much to gain by materializing result sets up front. There's a couple routes we could go down here, probably requiring multiple strategies depending on the query type and cost. This is probably most impactful if we want to do any sorting, joins or aggregations. I'm sure there's other things that would help, but these are the top 5 I can think of right now. I think at a bare minimum, to do != search we'd want #1 and #2 as they would perform full cluster scans. For other types of inequality such as c > 5, we'd need #4. #3 would make non-pk equality searches friendly to large clusters, and #5 would help with the more advanced types of SQL-ish queries. Thanks for bringing this up, it's an interesting topic! Jon On Tue, Mar 26, 2024 at 8:35 AM Benjamin Lerer <ble...@apache.org> wrote: > Hi everybody, > > CQL appears to be inconsistent in how it handles predicates. > > One type of inconsistencies is that some operators can be used in some > places but not in others or on some expressions but not others. > For example: > > - != can be used in LWT conditions but not elsewhere > - Token expressions (e.g. token(pk) = ?) support =, >, >=, =< and < > but not IN > - Map elements predicates (e.g m[?] = ?) only support = > - ... (a long list) > > This type of inconsistencies can be easily fixed over time. > > The other type of inconsistencies that is more confusing is about how we > deal with the combination of multiple predicates as we accept some and > reject others. > For example, we allow: "token(pk) > ? AND pk IN ?" but reject "c > ? AND > c IN ?". > For CQL, rejection seems to be the norm with only a few exceptions. > Whereas SQL accepts all inputs in terms of predicates combination. > > For the IS NULL and IS NOT NULL predicates for partition Key and > clustering columns, that we know cannot be null, that lead us to 2 choices: > either throwing an exception when any of them is specified on partition > keys/clustering columns or accepting them. Ignoring IS NOT NULL as it is a > noop and returning no rows when IS NULL is specified. > > I personally prefer the SQL behavior which can result in a simpler code. > Even if I can also understand the defensive approach. > > One way or the other, I believe that it would be good to standardize on > one approach and would like to hear your opinion. > > > > > >