[
https://issues.apache.org/jira/browse/CASSANDRA-20374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18003634#comment-18003634
]
Caleb Rackliffe edited comment on CASSANDRA-20374 at 11/6/25 10:37 PM:
-----------------------------------------------------------------------
Here are my notes from a first pass at review...
*Documentation*
- I made a pass at letting Claude generate some JavaDoc for the new
{{MultiStepSearcher}} interface and the {{PartialTrackedRead}} ->
{{AbstractPartialTrackedRead}} -> {{PartialTrackedIndexRead}} hierarchy. My
impression is that some of it could be useful, but I'm also open to waiting
until more of the overall CEP implementation stabilizes. Either way, let me
know if you want to collaborate on this.
*Testing*
- -We might be able to parameterize {{PartialUpdateHandlingTest}} with
replication type in a similar fashion to what this patch already does in
{{{}StrictFilteringTest{}}}.-
- -The refactoring to the existing SAI logic will be covered well enough as it
is, although some longer runs than what we have in CI for
{{{}MultiNodeSAITest{}}}, {{{}MultiNodeTableWalkWithoutReadRepairTest{}}}, and
{{MultiNodeTableWalkWithReadRepairTest}} are warranted. The more important
point is that we need to expand those (or at least the
{{MultiNodeTableWalkBase}} stuff) to cover tables with mutation tracking
enabled. (CC- [~dcapwell]{-}){-}
- -Should {{ReadRepairEmptyRangeTombstonesTestBase}} cover tables w/ mutation
tracking?-
- I'm wondering how necessary the {{AbstractTester#queryColumns()}} logic for
overwriting rows when {{replicationType.isTracked()}} is. It seems like the
tests (ex. {{{}ReadRepairIndexTest#rangeReadTest(){}}}) already account for
this by supplying the full set of rows. Also, in the "tracked" case,
{{verifyQuery()}} doesn't look like it really does much? (i.e. Everything is
already in sync when we call {{{}verifyQuery(){}}}?)
- Similar thing in {{AbstractTester#tearDown()}} where we make adjustments. I
played w/ just throwing this (in addition to removing the row reassignments in
there and in {{{}queryColumns(){}}}) at the start of {{verifyQuery()}} and it
seems to work:
{noformat}
if (replicationType.isTracked())
{
assertRows(cluster.get(1).executeInternal(query), allRows);
assertRows(cluster.get(2).executeInternal(query), allRows);
return self();
}
{noformat}
- -In {{ReadRepairIndexTest$Tester#createIndex()}} you might eventually have a
situation where {{default_secondary_index}} gets set to "sai". If that happens,
{{SECONDARY}} will duplicate {{{}SAI{}}}. You can use {{CassandraIndex.NAME}}
and force it to use the legacy 2i, etc.-
*{{RowFilter}}*
- -This class needs a rebase, as a few important fixes have been made
recently, like CASSANDRA-20566 and CASSANDRA-20541.-
*{{Memtable}}*
- {{snapshotPartition()}} is just exposing/wrapping getPartition() in all
implementations outside tests. Wondering why we didn't just make
{{getPartition()}} public? Also, is what we get back actually a snapshot?
Aren't we just getting an {{AtomicBTreePartition}} from
{{{}SkipListMemtable{}}}, but the {{ref}} could be updated in
{{{}AtomicBTreePartition{}}}?
*{{CassandraIndexSearcher}}*
- -Looks like {{queryDataFromIndex()}} and its implementations are unused now?-
*{{{}RegularColumnIndex{}}}, {{{}ClusteringColumnIndex{}}}, et al*
- {{decodeEntry()}} looks like it involves having to create a DK object it
didn't before, but it's probably small enough not to be too visible in a
profile.
*{{ShardReplicatedOffsets}}*
- -No love for the TRACE logging for {{{}verbHandler{}}}?-
*{{Index}}*
- -Is {{supportsMutationTracking()}} unused?-
*{{ReadCommand}}*
- -In {{{}ReadCompleter.TRACKED#complete(){}}}, the {{IllegalStateException}}
feels more like an assertion/programming error than a state that we shouldn't
see.-
*{{{}StorageAttachedIndexSearcher{}}}, {{QueryController}}*
- -Using the natural order means {{{}PrimaryKey#compareTo(){}}}, which is
"strict", and will treat static column index matches differently than normal
column index matches. (i.e. STATIC will sort at the start of the partition.)
Make sure that's what we want. There are lots of existing examples of
static/wide intersection testing. ({{{}StaticColumnIndexTest{}}} locally and
then {{MultiNodeTableWalkWithoutReadRepairTest}} for multi-node, etc.)-
- -All of this work was done before the enhancements in CASSANDRA-20191 for
skipping within a partition, so we'll have to rebase on that and make sure
things still work.-
- We appear to have moved the post-filtering row read batching to the second
step of the searcher process now, which should still be fine...just want to
confirm.
- -In {{{}MatchIterator#computeNext(){}}}, I think we can move the {{null}}
check for {{resultKeyIterator}} of of the loop and the skipping as well, as
they don't actually matter after the first call to {{{}computeNext(){}}}.-
- -I'm not 100% sure that leaving {{hasUnrepairedMatches}} alone in the
tracked reads case is the best way to manage what kind of filtering happens at
different steps of the multi-step read. It doesn't matter for the first stage,
where we're just getting the initial local index matches, as no row filtering
occurs there anyway. When we get to the step that uses {{{}MatchIndexer{}}}, we
should assess incoming mutations based on whether the original {{ReadCommand}}
decides on strict filtering. In other words, we could actually always set
{{hasUnrepairedMatches = true}} to preserve that intent, and the {{FilterTree}}
you're already building in the searcher constructor would do what it should. We
shouldn't force strict filtering on the {{{}MatchIndexer{}}}, which is what
looks to be happening now, and would (I think) be incorrect. There are a couple
other things in {{PartialTrackedIndexRead}} to discuss around what kind of
filtering we're applying as well...-
*{{PartialTrackedIndexRead}}*
- Claude is worried (anthropomorphizing, I know) about {{prepareInternal()}}
holding a large {{TreeSet}} of {{IndexMatch}} objects on the heap for
individual queries. There is a guardrail we have in RFP to limit this kind of
thing w/ warning and failure thresholds. (see
{{{}replica_filtering_protection.cached_rows_warn_threshold{}}}) The other
thing I'm curious about is the {{TreeSet}} doing extra sorting work when the
SAI match iterator should already produce a sorted stream of
{{{}PrimaryKey{}}}. It looks like this set is later directly added to during
augmentation, so I'm guessing that's the point, but if there is minimal
augmentation, perhaps it would be cheaper to zip that smaller sorted set into
the primary match iterator.
- -In {{{}prepareInternal(){}}}, there's this: {{{}// TODO (now): make
decorated key part of IndexEntry interface{}}}. Did it mean {{{}IndexMatch{}}}?
is it already done?-
- -In {{{}FollowUpRead{}}}, {{{}promise{}}}, {{{}consistencyLevel{}}}, and
{{expiresAtNanos}} are either unused or can be local-
- -{{Preconditions.checkArgument(minIdx >= 0);}} in
{{MergingMatchIterator#computeNext()}} could probably just be an assertion,
given it's guaranteed to be true.-
- nit: {{SnapshotView}} and {{IndexpartitionRead}} assume a single DK, but
they also have methods that take arbitrary partition updates. There are
defensive checks in place to make sure those things don't get mixed up, but I
wonder if there's a better way...
- nit: {{prepareInternal()}} in the indexed case doesn't use
{{{}initialData{}}}. I guess we could codify that expectation and throw IAE
otherwise.
- nit: General naming ambiguities, things like {{IndexCompletedRead}} and
{{CompletedIndexRead}} were there is only one implementation of the interface.
- -{{IndexCompletedRead.UnfilteredResultIterator}} can't be static because of
type args, so should we just use the {{followUpReads}} from the enclosing
class?-
- {{IndexPreComplete#preComplete()}} and it's abstract version in the super
don't do anything.
- -Let's return to our conversation around post-filtering and
{{{}hasUnrepairedMatches{}}}. In {{PTIR}} we filter in three places I can see,
and all of them look to be over completed/augmented data. This means that we
should be using strict filtering for all of them. Right now, this actually
might work, because we leave {{hasUnrepairedMatches = false}} for the tracked
case and the existing {{filterTree}} respects it, but if we correct this (see
the comment above), we would have make sure the {{FilterTree}} we use here is
explicitly made strict, just as we force it in RFP at the coordinator (see
{{{}filterReplicaFilteringProtection(){}}}).-
- -We post-filter with {{readHit()}} in {{{}UnfilteredResultIterator{}}}, but
then again in {{filter()}} via {{{}filterCompletedRead(){}}}. Has any of the
underlying data changed there between the {{readHit()}} call with an augmented
{{View}} and {{{}filterCompletedRead(){}}}? If not, are we filtering twice?
(Again, if the data is completed in both cases, the filtering should be
explicitly strict.)-
[~bdeggleston] Let's discuss when you're back from your break if you have some
time.
was (Author: maedhroz):
Here are my notes from a first pass at review...
*Documentation*
- I made a pass at letting Claude generate some JavaDoc for the new
{{MultiStepSearcher}} interface and the {{PartialTrackedRead}} ->
{{AbstractPartialTrackedRead}} -> {{PartialTrackedIndexRead}} hierarchy. My
impression is that some of it could be useful, but I'm also open to waiting
until more of the overall CEP implementation stabilizes. Either way, let me
know if you want to collaborate on this.
*Testing*
- -We might be able to parameterize {{PartialUpdateHandlingTest}} with
replication type in a similar fashion to what this patch already does in
{{{}StrictFilteringTest{}}}.-
- -The refactoring to the existing SAI logic will be covered well enough as it
is, although some longer runs than what we have in CI for
{{{}MultiNodeSAITest{}}}, {{{}MultiNodeTableWalkWithoutReadRepairTest{}}}, and
{{MultiNodeTableWalkWithReadRepairTest}} are warranted. The more important
point is that we need to expand those (or at least the
{{MultiNodeTableWalkBase}} stuff) to cover tables with mutation tracking
enabled. (CC- [~dcapwell]{-}){-}
- -Should {{ReadRepairEmptyRangeTombstonesTestBase}} cover tables w/ mutation
tracking?-
- I'm wondering how necessary the {{AbstractTester#queryColumns()}} logic for
overwriting rows when {{replicationType.isTracked()}} is. It seems like the
tests (ex. {{{}ReadRepairIndexTest#rangeReadTest(){}}}) already account for
this by supplying the full set of rows. Also, in the "tracked" case,
{{verifyQuery()}} doesn't look like it really does much? (i.e. Everything is
already in sync when we call {{{}verifyQuery(){}}}?)
- Similar thing in {{AbstractTester#tearDown()}} where we make adjustments. I
played w/ just throwing this (in addition to removing the row reassignments in
there and in {{{}queryColumns(){}}}) at the start of {{verifyQuery()}} and it
seems to work:
{noformat}
if (replicationType.isTracked())
{
assertRows(cluster.get(1).executeInternal(query), allRows);
assertRows(cluster.get(2).executeInternal(query), allRows);
return self();
}
{noformat}
- -In {{ReadRepairIndexTest$Tester#createIndex()}} you might eventually have a
situation where {{default_secondary_index}} gets set to "sai". If that happens,
{{SECONDARY}} will duplicate {{{}SAI{}}}. You can use {{CassandraIndex.NAME}}
and force it to use the legacy 2i, etc.-
*{{RowFilter}}*
- -This class needs a rebase, as a few important fixes have been made
recently, like CASSANDRA-20566 and CASSANDRA-20541.-
*{{Memtable}}*
- {{snapshotPartition()}} is just exposing/wrapping getPartition() in all
implementations outside tests. Wondering why we didn't just make
{{getPartition()}} public? Also, is what we get back actually a snapshot?
Aren't we just getting an {{AtomicBTreePartition}} from
{{{}SkipListMemtable{}}}, but the {{ref}} could be updated in
{{{}AtomicBTreePartition{}}}?
*{{CassandraIndexSearcher}}*
- -Looks like {{queryDataFromIndex()}} and its implementations are unused now?-
*{{{}RegularColumnIndex{}}}, {{{}ClusteringColumnIndex{}}}, et al*
- {{decodeEntry()}} looks like it involves having to create a DK object it
didn't before, but it's probably small enough not to be too visible in a
profile.
*{{ShardReplicatedOffsets}}*
- -No love for the TRACE logging for {{{}verbHandler{}}}?-
*{{Index}}*
- -Is {{supportsMutationTracking()}} unused?-
*{{ReadCommand}}*
- -In {{{}ReadCompleter.TRACKED#complete(){}}}, the {{IllegalStateException}}
feels more like an assertion/programming error than a state that we shouldn't
see.-
*{{{}StorageAttachedIndexSearcher{}}}, {{QueryController}}*
- -Using the natural order means {{{}PrimaryKey#compareTo(){}}}, which is
"strict", and will treat static column index matches differently than normal
column index matches. (i.e. STATIC will sort at the start of the partition.)
Make sure that's what we want. There are lots of existing examples of
static/wide intersection testing. ({{{}StaticColumnIndexTest{}}} locally and
then {{MultiNodeTableWalkWithoutReadRepairTest}} for multi-node, etc.)-
- -All of this work was done before the enhancements in CASSANDRA-20191 for
skipping within a partition, so we'll have to rebase on that and make sure
things still work.-
- We appear to have moved the post-filtering row read batching to the second
step of the searcher process now, which should still be fine...just want to
confirm.
- -In {{{}MatchIterator#computeNext(){}}}, I think we can move the {{null}}
check for {{resultKeyIterator}} of of the loop and the skipping as well, as
they don't actually matter after the first call to {{{}computeNext(){}}}.-
- -I'm not 100% sure that leaving {{hasUnrepairedMatches}} alone in the
tracked reads case is the best way to manage what kind of filtering happens at
different steps of the multi-step read. It doesn't matter for the first stage,
where we're just getting the initial local index matches, as no row filtering
occurs there anyway. When we get to the step that uses {{{}MatchIndexer{}}}, we
should assess incoming mutations based on whether the original {{ReadCommand}}
decides on strict filtering. In other words, we could actually always set
{{hasUnrepairedMatches = true}} to preserve that intent, and the {{FilterTree}}
you're already building in the searcher constructor would do what it should. We
shouldn't force strict filtering on the {{{}MatchIndexer{}}}, which is what
looks to be happening now, and would (I think) be incorrect. There are a couple
other things in {{PartialTrackedIndexRead}} to discuss around what kind of
filtering we're applying as well...-
*{{PartialTrackedIndexRead}}*
- Claude is worried (anthropomorphizing, I know) about {{prepareInternal()}}
holding a large {{TreeSet}} of {{IndexMatch}} objects on the heap for
individual queries. There is a guardrail we have in RFP to limit this kind of
thing w/ warning and failure thresholds. (see
{{{}replica_filtering_protection.cached_rows_warn_threshold{}}}) The other
thing I'm curious about is the {{TreeSet}} doing extra sorting work when the
SAI match iterator should already produce a sorted stream of
{{{}PrimaryKey{}}}. It looks like this set is later directly added to during
augmentation, so I'm guessing that's the point, but if there is minimal
augmentation, perhaps it would be cheaper to zip that smaller sorted set into
the primary match iterator.
- -In {{{}prepareInternal(){}}}, there's this: {{{}// TODO (now): make
decorated key part of IndexEntry interface{}}}. Did it mean {{{}IndexMatch{}}}?
is it already done?-
- -In {{{}FollowUpRead{}}}, {{{}promise{}}}, {{{}consistencyLevel{}}}, and
{{expiresAtNanos}} are either unused or can be local-
- -{{Preconditions.checkArgument(minIdx >= 0);}} in
{{MergingMatchIterator#computeNext()}} could probably just be an assertion,
given it's guaranteed to be true.-
- nit: {{SnapshotView}} and {{IndexpartitionRead}} assume a single DK, but
they also have methods that take arbitrary partition updates. There are
defensive checks in place to make sure those things don't get mixed up, but I
wonder if there's a better way...
- nit: {{prepareInternal()}} in the indexed case doesn't use
{{{}initialData{}}}. I guess we could codify that expectation and throw IAE
otherwise.
- nit: General naming ambiguities, things like {{IndexCompletedRead}} and
{{CompletedIndexRead}} were there is only one implementation of the interface.
- {{IndexCompletedRead.UnfilteredResultIterator}} can't be static because of
type args, so should we just use the {{followUpReads}} from the enclosing class?
- {{IndexPreComplete#preComplete()}} and it's abstract version in the super
don't do anything.
- -Let's return to our conversation around post-filtering and
{{{}hasUnrepairedMatches{}}}. In {{PTIR}} we filter in three places I can see,
and all of them look to be over completed/augmented data. This means that we
should be using strict filtering for all of them. Right now, this actually
might work, because we leave {{hasUnrepairedMatches = false}} for the tracked
case and the existing {{filterTree}} respects it, but if we correct this (see
the comment above), we would have make sure the {{FilterTree}} we use here is
explicitly made strict, just as we force it in RFP at the coordinator (see
{{{}filterReplicaFilteringProtection(){}}}).-
- -We post-filter with {{readHit()}} in {{{}UnfilteredResultIterator{}}}, but
then again in {{filter()}} via {{{}filterCompletedRead(){}}}. Has any of the
underlying data changed there between the {{readHit()}} call with an augmented
{{View}} and {{{}filterCompletedRead(){}}}? If not, are we filtering twice?
(Again, if the data is completed in both cases, the filtering should be
explicitly strict.)-
[~bdeggleston] Let's discuss when you're back from your break if you have some
time.
> CEP-45: Index read support for mutation tracking
> ------------------------------------------------
>
> Key: CASSANDRA-20374
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20374
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Consistency/Coordination
> Reporter: Blake Eggleston
> Assignee: Caleb Rackliffe
> Priority: Normal
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Index reads are disabled for mutation tracking and we need to add support for
> them. A big part of this will be adapting the ReplicaFilteringProtection to
> work with the logged read classes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]