[jira] [Comment Edited] (CASSANDRA-20374) CEP-45: Index read support for mutation tracking

Caleb Rackliffe (Jira) Thu, 06 Nov 2025 14:38:48 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18003634#comment-18003634
 ]


Caleb Rackliffe edited comment on CASSANDRA-20374 at 11/6/25 10:37 PM:
-----------------------------------------------------------------------

Here are my notes from a first pass at review...

*Documentation*
 - I made a pass at letting Claude generate some JavaDoc for the new 
{{MultiStepSearcher}} interface and the {{PartialTrackedRead}} -> 
{{AbstractPartialTrackedRead}} -> {{PartialTrackedIndexRead}} hierarchy. My 
impression is that some of it could be useful, but I'm also open to waiting 
until more of the overall CEP implementation stabilizes. Either way, let me 
know if you want to collaborate on this.

*Testing*
 - -We might be able to parameterize {{PartialUpdateHandlingTest}} with 
replication type in a similar fashion to what this patch already does in 
{{{}StrictFilteringTest{}}}.-
 - -The refactoring to the existing SAI logic will be covered well enough as it 
is, although some longer runs than what we have in CI for 
{{{}MultiNodeSAITest{}}}, {{{}MultiNodeTableWalkWithoutReadRepairTest{}}}, and 
{{MultiNodeTableWalkWithReadRepairTest}} are warranted. The more important 
point is that we need to expand those (or at least the 
{{MultiNodeTableWalkBase}} stuff) to cover tables with mutation tracking 
enabled. (CC- [~dcapwell]{-}){-}
 - -Should {{ReadRepairEmptyRangeTombstonesTestBase}} cover tables w/ mutation 
tracking?-
 - I'm wondering how necessary the {{AbstractTester#queryColumns()}} logic for 
overwriting rows when {{replicationType.isTracked()}} is. It seems like the 
tests (ex. {{{}ReadRepairIndexTest#rangeReadTest(){}}}) already account for 
this by supplying the full set of rows. Also, in the "tracked" case, 
{{verifyQuery()}} doesn't look like it really does much? (i.e. Everything is 
already in sync when we call {{{}verifyQuery(){}}}?)
 - Similar thing in {{AbstractTester#tearDown()}} where we make adjustments. I 
played w/ just throwing this (in addition to removing the row reassignments in 
there and in {{{}queryColumns(){}}}) at the start of {{verifyQuery()}} and it 
seems to work:
{noformat}
if (replicationType.isTracked())
{
    assertRows(cluster.get(1).executeInternal(query), allRows);
    assertRows(cluster.get(2).executeInternal(query), allRows);
    return self();
}
{noformat}

 - -In {{ReadRepairIndexTest$Tester#createIndex()}} you might eventually have a 
situation where {{default_secondary_index}} gets set to "sai". If that happens, 
{{SECONDARY}} will duplicate {{{}SAI{}}}. You can use {{CassandraIndex.NAME}} 
and force it to use the legacy 2i, etc.-

*{{RowFilter}}*
 - -This class needs a rebase, as a few important fixes have been made 
recently, like CASSANDRA-20566 and CASSANDRA-20541.-

*{{Memtable}}*
 - {{snapshotPartition()}} is just exposing/wrapping getPartition() in all 
implementations outside tests. Wondering why we didn't just make 
{{getPartition()}} public? Also, is what we get back actually a snapshot? 
Aren't we just getting an {{AtomicBTreePartition}} from 
{{{}SkipListMemtable{}}}, but the {{ref}} could be updated in 
{{{}AtomicBTreePartition{}}}?

*{{CassandraIndexSearcher}}*
 - -Looks like {{queryDataFromIndex()}} and its implementations are unused now?-

*{{{}RegularColumnIndex{}}}, {{{}ClusteringColumnIndex{}}}, et al*
 - {{decodeEntry()}} looks like it involves having to create a DK object it 
didn't before, but it's probably small enough not to be too visible in a 
profile.

*{{ShardReplicatedOffsets}}*
 - -No love for the TRACE logging for {{{}verbHandler{}}}?-

*{{Index}}*
 - -Is {{supportsMutationTracking()}} unused?-

*{{ReadCommand}}*
 - -In {{{}ReadCompleter.TRACKED#complete(){}}}, the {{IllegalStateException}} 
feels more like an assertion/programming error than a state that we shouldn't 
see.-

*{{{}StorageAttachedIndexSearcher{}}}, {{QueryController}}*
 - -Using the natural order means {{{}PrimaryKey#compareTo(){}}}, which is 
"strict", and will treat static column index matches differently than normal 
column index matches. (i.e. STATIC will sort at the start of the partition.) 
Make sure that's what we want. There are lots of existing examples of 
static/wide intersection testing. ({{{}StaticColumnIndexTest{}}} locally and 
then {{MultiNodeTableWalkWithoutReadRepairTest}} for multi-node, etc.)-
 - -All of this work was done before the enhancements in CASSANDRA-20191 for 
skipping within a partition, so we'll have to rebase on that and make sure 
things still work.-
 - We appear to have moved the post-filtering row read batching to the second 
step of the searcher process now, which should still be fine...just want to 
confirm.
 - -In {{{}MatchIterator#computeNext(){}}}, I think we can move the {{null}} 
check for {{resultKeyIterator}} of of the loop and the skipping as well, as 
they don't actually matter after the first call to {{{}computeNext(){}}}.-
 - -I'm not 100% sure that leaving {{hasUnrepairedMatches}} alone in the 
tracked reads case is the best way to manage what kind of filtering happens at 
different steps of the multi-step read. It doesn't matter for the first stage, 
where we're just getting the initial local index matches, as no row filtering 
occurs there anyway. When we get to the step that uses {{{}MatchIndexer{}}}, we 
should assess incoming mutations based on whether the original {{ReadCommand}} 
decides on strict filtering. In other words, we could actually always set 
{{hasUnrepairedMatches = true}} to preserve that intent, and the {{FilterTree}} 
you're already building in the searcher constructor would do what it should. We 
shouldn't force strict filtering on the {{{}MatchIndexer{}}}, which is what 
looks to be happening now, and would (I think) be incorrect. There are a couple 
other things in {{PartialTrackedIndexRead}} to discuss around what kind of 
filtering we're applying as well...-

*{{PartialTrackedIndexRead}}*
 - Claude is worried (anthropomorphizing, I know) about {{prepareInternal()}} 
holding a large {{TreeSet}} of {{IndexMatch}} objects on the heap for 
individual queries. There is a guardrail we have in RFP to limit this kind of 
thing w/ warning and failure thresholds. (see 
{{{}replica_filtering_protection.cached_rows_warn_threshold{}}}) The other 
thing I'm curious about is the {{TreeSet}} doing extra sorting work when the 
SAI match iterator should already produce a sorted stream of 
{{{}PrimaryKey{}}}. It looks like this set is later directly added to during 
augmentation, so I'm guessing that's the point, but if there is minimal 
augmentation, perhaps it would be cheaper to zip that smaller sorted set into 
the primary match iterator.
 - -In {{{}prepareInternal(){}}}, there's this: {{{}// TODO (now): make 
decorated key part of IndexEntry interface{}}}. Did it mean {{{}IndexMatch{}}}? 
is it already done?-
 - -In {{{}FollowUpRead{}}}, {{{}promise{}}}, {{{}consistencyLevel{}}}, and 
{{expiresAtNanos}} are either unused or can be local-
 - -{{Preconditions.checkArgument(minIdx >= 0);}} in 
{{MergingMatchIterator#computeNext()}} could probably just be an assertion, 
given it's guaranteed to be true.-
 - nit: {{SnapshotView}} and {{IndexpartitionRead}} assume a single DK, but 
they also have methods that take arbitrary partition updates. There are 
defensive checks in place to make sure those things don't get mixed up, but I 
wonder if there's a better way...
 - nit: {{prepareInternal()}} in the indexed case doesn't use 
{{{}initialData{}}}. I guess we could codify that expectation and throw IAE 
otherwise.
 - nit: General naming ambiguities, things like {{IndexCompletedRead}} and 
{{CompletedIndexRead}} were there is only one implementation of the interface.
 - -{{IndexCompletedRead.UnfilteredResultIterator}} can't be static because of 
type args, so should we just use the {{followUpReads}} from the enclosing 
class?-
 - {{IndexPreComplete#preComplete()}} and it's abstract version in the super 
don't do anything.
 - -Let's return to our conversation around post-filtering and 
{{{}hasUnrepairedMatches{}}}. In {{PTIR}} we filter in three places I can see, 
and all of them look to be over completed/augmented data. This means that we 
should be using strict filtering for all of them. Right now, this actually 
might work, because we leave {{hasUnrepairedMatches = false}} for the tracked 
case and the existing {{filterTree}} respects it, but if we correct this (see 
the comment above), we would have make sure the {{FilterTree}} we use here is 
explicitly made strict, just as we force it in RFP at the coordinator (see 
{{{}filterReplicaFilteringProtection(){}}}).-
 - -We post-filter with {{readHit()}} in {{{}UnfilteredResultIterator{}}}, but 
then again in {{filter()}} via {{{}filterCompletedRead(){}}}. Has any of the 
underlying data changed there between the {{readHit()}} call with an augmented 
{{View}} and {{{}filterCompletedRead(){}}}? If not, are we filtering twice? 
(Again, if the data is completed in both cases, the filtering should be 
explicitly strict.)-

[~bdeggleston] Let's discuss when you're back from your break if you have some 
time.


was (Author: maedhroz):
Here are my notes from a first pass at review...

*Documentation*
 - I made a pass at letting Claude generate some JavaDoc for the new 
{{MultiStepSearcher}} interface and the {{PartialTrackedRead}} -> 
{{AbstractPartialTrackedRead}} -> {{PartialTrackedIndexRead}} hierarchy. My 
impression is that some of it could be useful, but I'm also open to waiting 
until more of the overall CEP implementation stabilizes. Either way, let me 
know if you want to collaborate on this.

*Testing*
 - -We might be able to parameterize {{PartialUpdateHandlingTest}} with 
replication type in a similar fashion to what this patch already does in 
{{{}StrictFilteringTest{}}}.-
 - -The refactoring to the existing SAI logic will be covered well enough as it 
is, although some longer runs than what we have in CI for 
{{{}MultiNodeSAITest{}}}, {{{}MultiNodeTableWalkWithoutReadRepairTest{}}}, and 
{{MultiNodeTableWalkWithReadRepairTest}} are warranted. The more important 
point is that we need to expand those (or at least the 
{{MultiNodeTableWalkBase}} stuff) to cover tables with mutation tracking 
enabled. (CC- [~dcapwell]{-}){-}
 - -Should {{ReadRepairEmptyRangeTombstonesTestBase}} cover tables w/ mutation 
tracking?-
 - I'm wondering how necessary the {{AbstractTester#queryColumns()}} logic for 
overwriting rows when {{replicationType.isTracked()}} is. It seems like the 
tests (ex. {{{}ReadRepairIndexTest#rangeReadTest(){}}}) already account for 
this by supplying the full set of rows. Also, in the "tracked" case, 
{{verifyQuery()}} doesn't look like it really does much? (i.e. Everything is 
already in sync when we call {{{}verifyQuery(){}}}?)
 - Similar thing in {{AbstractTester#tearDown()}} where we make adjustments. I 
played w/ just throwing this (in addition to removing the row reassignments in 
there and in {{{}queryColumns(){}}}) at the start of {{verifyQuery()}} and it 
seems to work:
{noformat}
if (replicationType.isTracked())
{
    assertRows(cluster.get(1).executeInternal(query), allRows);
    assertRows(cluster.get(2).executeInternal(query), allRows);
    return self();
}
{noformat}

 - -In {{ReadRepairIndexTest$Tester#createIndex()}} you might eventually have a 
situation where {{default_secondary_index}} gets set to "sai". If that happens, 
{{SECONDARY}} will duplicate {{{}SAI{}}}. You can use {{CassandraIndex.NAME}} 
and force it to use the legacy 2i, etc.-

*{{RowFilter}}*
 - -This class needs a rebase, as a few important fixes have been made 
recently, like CASSANDRA-20566 and CASSANDRA-20541.-

*{{Memtable}}*
 - {{snapshotPartition()}} is just exposing/wrapping getPartition() in all 
implementations outside tests. Wondering why we didn't just make 
{{getPartition()}} public? Also, is what we get back actually a snapshot? 
Aren't we just getting an {{AtomicBTreePartition}} from 
{{{}SkipListMemtable{}}}, but the {{ref}} could be updated in 
{{{}AtomicBTreePartition{}}}?

*{{CassandraIndexSearcher}}*
 - -Looks like {{queryDataFromIndex()}} and its implementations are unused now?-

*{{{}RegularColumnIndex{}}}, {{{}ClusteringColumnIndex{}}}, et al*
 - {{decodeEntry()}} looks like it involves having to create a DK object it 
didn't before, but it's probably small enough not to be too visible in a 
profile.

*{{ShardReplicatedOffsets}}*
 - -No love for the TRACE logging for {{{}verbHandler{}}}?-

*{{Index}}*
 - -Is {{supportsMutationTracking()}} unused?-

*{{ReadCommand}}*
 - -In {{{}ReadCompleter.TRACKED#complete(){}}}, the {{IllegalStateException}} 
feels more like an assertion/programming error than a state that we shouldn't 
see.-

*{{{}StorageAttachedIndexSearcher{}}}, {{QueryController}}*
 - -Using the natural order means {{{}PrimaryKey#compareTo(){}}}, which is 
"strict", and will treat static column index matches differently than normal 
column index matches. (i.e. STATIC will sort at the start of the partition.) 
Make sure that's what we want. There are lots of existing examples of 
static/wide intersection testing. ({{{}StaticColumnIndexTest{}}} locally and 
then {{MultiNodeTableWalkWithoutReadRepairTest}} for multi-node, etc.)-
 - -All of this work was done before the enhancements in CASSANDRA-20191 for 
skipping within a partition, so we'll have to rebase on that and make sure 
things still work.-
 - We appear to have moved the post-filtering row read batching to the second 
step of the searcher process now, which should still be fine...just want to 
confirm.
 - -In {{{}MatchIterator#computeNext(){}}}, I think we can move the {{null}} 
check for {{resultKeyIterator}} of of the loop and the skipping as well, as 
they don't actually matter after the first call to {{{}computeNext(){}}}.-
 - -I'm not 100% sure that leaving {{hasUnrepairedMatches}} alone in the 
tracked reads case is the best way to manage what kind of filtering happens at 
different steps of the multi-step read. It doesn't matter for the first stage, 
where we're just getting the initial local index matches, as no row filtering 
occurs there anyway. When we get to the step that uses {{{}MatchIndexer{}}}, we 
should assess incoming mutations based on whether the original {{ReadCommand}} 
decides on strict filtering. In other words, we could actually always set 
{{hasUnrepairedMatches = true}} to preserve that intent, and the {{FilterTree}} 
you're already building in the searcher constructor would do what it should. We 
shouldn't force strict filtering on the {{{}MatchIndexer{}}}, which is what 
looks to be happening now, and would (I think) be incorrect. There are a couple 
other things in {{PartialTrackedIndexRead}} to discuss around what kind of 
filtering we're applying as well...-

*{{PartialTrackedIndexRead}}*
 - Claude is worried (anthropomorphizing, I know) about {{prepareInternal()}} 
holding a large {{TreeSet}} of {{IndexMatch}} objects on the heap for 
individual queries. There is a guardrail we have in RFP to limit this kind of 
thing w/ warning and failure thresholds. (see 
{{{}replica_filtering_protection.cached_rows_warn_threshold{}}}) The other 
thing I'm curious about is the {{TreeSet}} doing extra sorting work when the 
SAI match iterator should already produce a sorted stream of 
{{{}PrimaryKey{}}}. It looks like this set is later directly added to during 
augmentation, so I'm guessing that's the point, but if there is minimal 
augmentation, perhaps it would be cheaper to zip that smaller sorted set into 
the primary match iterator.
 - -In {{{}prepareInternal(){}}}, there's this: {{{}// TODO (now): make 
decorated key part of IndexEntry interface{}}}. Did it mean {{{}IndexMatch{}}}? 
is it already done?-
 - -In {{{}FollowUpRead{}}}, {{{}promise{}}}, {{{}consistencyLevel{}}}, and 
{{expiresAtNanos}} are either unused or can be local-
 - -{{Preconditions.checkArgument(minIdx >= 0);}} in 
{{MergingMatchIterator#computeNext()}} could probably just be an assertion, 
given it's guaranteed to be true.-
 - nit: {{SnapshotView}} and {{IndexpartitionRead}} assume a single DK, but 
they also have methods that take arbitrary partition updates. There are 
defensive checks in place to make sure those things don't get mixed up, but I 
wonder if there's a better way...
 - nit: {{prepareInternal()}} in the indexed case doesn't use 
{{{}initialData{}}}. I guess we could codify that expectation and throw IAE 
otherwise.
 - nit: General naming ambiguities, things like {{IndexCompletedRead}} and 
{{CompletedIndexRead}} were there is only one implementation of the interface.
 - {{IndexCompletedRead.UnfilteredResultIterator}} can't be static because of 
type args, so should we just use the {{followUpReads}} from the enclosing class?
 - {{IndexPreComplete#preComplete()}} and it's abstract version in the super 
don't do anything.
 - -Let's return to our conversation around post-filtering and 
{{{}hasUnrepairedMatches{}}}. In {{PTIR}} we filter in three places I can see, 
and all of them look to be over completed/augmented data. This means that we 
should be using strict filtering for all of them. Right now, this actually 
might work, because we leave {{hasUnrepairedMatches = false}} for the tracked 
case and the existing {{filterTree}} respects it, but if we correct this (see 
the comment above), we would have make sure the {{FilterTree}} we use here is 
explicitly made strict, just as we force it in RFP at the coordinator (see 
{{{}filterReplicaFilteringProtection(){}}}).-
 - -We post-filter with {{readHit()}} in {{{}UnfilteredResultIterator{}}}, but 
then again in {{filter()}} via {{{}filterCompletedRead(){}}}. Has any of the 
underlying data changed there between the {{readHit()}} call with an augmented 
{{View}} and {{{}filterCompletedRead(){}}}? If not, are we filtering twice? 
(Again, if the data is completed in both cases, the filtering should be 
explicitly strict.)-

[~bdeggleston] Let's discuss when you're back from your break if you have some 
time.

> CEP-45: Index read support for mutation tracking
> ------------------------------------------------
>
>                 Key: CASSANDRA-20374
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20374
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Consistency/Coordination
>            Reporter: Blake Eggleston
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Index reads are disabled for mutation tracking and we need to add support for 
> them. A big part of this will be adapting the ReplicaFilteringProtection to 
> work with the logged read classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-20374) CEP-45: Index read support for mutation tracking

Reply via email to