I like the idea of accepting more types of queries with fewer
restrictions.  I think we've been moving in the right direction, with SAI
opening up more types of query possibilities.

I think the long term path towards more flexibility requires paying off
some technical debt.  We have a ton of places where we over-allocate or
perform I/O sub-optimally.  I think in order to support more types of
queries we need to address this.  At the moment it's difficult for me to
envision a path towards complex queries with multiple predicates over
massive datasets without highly efficient distributed indexes and a serious
reduction in allocation amplification.

Here are the high level requirements as I see them in order to get there.

1. Optimized read path with minimal garbage.  This is a problem today, and
would become a bigger one as we read bigger datasets.  Relying on GC to
handle this for us isn't a great solution here, it takes up a massive
amount of CPU time and I've found it's fairly easy to lock up an entire
cluster using ZGC.  We need to reduce allocations on reads.  I don't see
any way around this.  Fortunately there's some nice stuff coming in future
JDKs that allow for better memory management that we can take advantage
of.  Stefan just pointed this out to me a couple days ago:
https://openjdk.org/jeps/454, I think arenas would work nicely for
allocations during reads as well as memtables.

2. Optimized I/O for bulk reads. If we're going to start doing more random
I/O then we need to do better about treating it as a limited resource,
especially since so many teams are moving to IOPS limited volumes like EBS.
I've already filed CASSANDRA-15452 to address this from a compaction
standpoint, since it's massively wasteful right now.  If we're going to
support inequality searches we're going need more efficient table scans as
well.  We currently read chunk by chunk which is generally only ~8KB or so,
assuming 50% compression w/ 16KB chunk length.  CASSANDRA-15452 should make
it fairly easy to address the I/O waste during compaction, and I've just
filed CASSANDRA-19494 which would help with table scans.

3. Materialized views that have some guarantees of consistency that work
well with larger partitions.  The current state of them makes them unfit
for production since they can't be repaired and it's too easy to create MVs
that ruin a cluster.  I think we do need them for certain types of global
queries, but not all.  Anything that has very high cardinality with large
clusters would work better with MVs than round-robin of the entire cluster.

4. Secondary indexes that perform well for global searches, that scale
efficiently with SSTable counts.  I believe we still need node-centric 2i
that isn't SAI.  I'm testing SAI with global queries tomorrow but I can't
see how queries without partition restrictions that scale O(N) with sstable
counts will be performant.  We need a good indexing solution that is
node-centric.

5. Pagination.  Reevaluating entire queries in order to paginate is OK now
since there's not much to gain by materializing result sets up front.
There's a couple routes we could go down here, probably requiring multiple
strategies depending on the query type and cost.  This is probably most
impactful if we want to do any sorting, joins or aggregations.

I'm sure there's other things that would help, but these are the top 5 I
can think of right now.  I think at a bare minimum, to do != search we'd
want #1 and #2 as they would perform full cluster scans.  For other types
of inequality such as c > 5, we'd need #4.  #3 would make non-pk equality
searches friendly to large clusters, and #5 would help with the more
advanced types of SQL-ish queries.

Thanks for bringing this up, it's an interesting topic!
Jon


On Tue, Mar 26, 2024 at 8:35 AM Benjamin Lerer <ble...@apache.org> wrote:

> Hi everybody,
>
> CQL appears to be inconsistent in how it handles predicates.
>
> One type of inconsistencies is that some operators can be used in some
> places but not in others or on some expressions but not others.
> For example:
>
>    - != can be used in LWT conditions but not elsewhere
>    - Token expressions (e.g. token(pk) = ?) support =, >, >=, =< and <
>    but not IN
>    - Map elements predicates (e.g m[?] = ?) only support =
>    - ... (a long list)
>
> This type of inconsistencies can be easily fixed over time.
>
> The other type of inconsistencies that is more confusing is about how we
> deal with the combination of multiple predicates as we accept some and
> reject others.
> For example, we allow: "token(pk) > ? AND pk IN ?" but reject "c > ? AND
> c IN ?".
> For CQL, rejection seems to be the norm with only a few exceptions.
> Whereas SQL accepts all inputs in terms of predicates combination.
>
> For the IS NULL and IS NOT NULL predicates for partition Key and
> clustering columns, that we know cannot be null, that lead us to 2 choices:
> either throwing an exception when any of them is specified on partition
> keys/clustering columns or accepting them. Ignoring IS NOT NULL as it is a
> noop and returning no rows when IS NULL is specified.
>
> I personally prefer the SQL behavior which can result in a simpler code.
> Even if I can also understand the defensive approach.
>
> One way or the other, I believe that it would be good to standardize on
> one approach and would like to hear your opinion.
>
>
>
>
>
>

Reply via email to