[
https://issues.apache.org/jira/browse/CASSANDRA-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168200#comment-17168200
]
David Capwell commented on CASSANDRA-15907:
-------------------------------------------
FYI org.apache.cassandra.config.DatabaseDescriptorRefTest is failing
https://app.circleci.com/pipelines/github/dcapwell/cassandra/379/workflows/83c3e1f6-3279-4426-8af8-a02926b10774/jobs/1975
{code}
git checkout trunk
git pull --rebase upstream trunk
ant realclean && ant && ant generate-idea-files
ant testclasslist -Dtest.classlistfile=<(echo
org/apache/cassandra/config/DatabaseDescriptorRefTest.java)
-Dtest.classlistprefix=unit
...
[junit-timeout] Testcase:
testDatabaseDescriptorRef(org.apache.cassandra.config.DatabaseDescriptorRefTest):
FAILED
[junit-timeout] null
[junit-timeout] junit.framework.AssertionFailedError
[junit-timeout] at
org.apache.cassandra.config.DatabaseDescriptorRefTest.checkViolations(DatabaseDescriptorRefTest.java:303)
[junit-timeout] at
org.apache.cassandra.config.DatabaseDescriptorRefTest.testDatabaseDescriptorRef(DatabaseDescriptorRefTest.java:287)
[junit-timeout]
[junit-timeout]
[junit-timeout] Test org.apache.cassandra.config.DatabaseDescriptorRefTest
FAILED
[junitreport] Processing
/Users/davidcapwell/src/github/apache/cassandra-trunk/build/test/TESTS-TestSuites.xml
to /var/folders/cm/08cddl2s25j7fq3jdb76gh4r0000gn/T/null2013776906
[junitreport] Loading stylesheet
jar:file:/usr/local/Cellar/ant/1.10.7/libexec/lib/ant-junit.jar!/org/apache/tools/ant/taskdefs/optional/junit/xsl/junit-frames.xsl
[junitreport] Transform time: 277ms
[junitreport] Deleting:
/var/folders/cm/08cddl2s25j7fq3jdb76gh4r0000gn/T/null2013776906
BUILD FAILED
/Users/davidcapwell/src/github/apache/cassandra-trunk/build.xml:1981: The
following error occurred while executing this line:
/Users/davidcapwell/src/github/apache/cassandra-trunk/build.xml:1871: Some
test(s) failed.
{code}
> Operational Improvements & Hardening for Replica Filtering Protection
> ---------------------------------------------------------------------
>
> Key: CASSANDRA-15907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15907
> Project: Cassandra
> Issue Type: Improvement
> Components: Consistency/Coordination, Feature/2i Index
> Reporter: Caleb Rackliffe
> Assignee: Caleb Rackliffe
> Priority: Normal
> Labels: 2i, memory
> Fix For: 3.0.22, 3.11.8, 4.0-beta2
>
> Time Spent: 8h 50m
> Remaining Estimate: 0h
>
> CASSANDRA-8272 uses additional space on the heap to ensure correctness for 2i
> and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a
> few things we should follow up on, however, to make life a bit easier for
> operators and generally de-risk usage:
> (Note: Line numbers are based on {{trunk}} as of
> {{3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb}}.)
> *Minor Optimizations*
> * {{ReplicaFilteringProtection:114}} - Given we size them up-front, we may be
> able to use simple arrays instead of lists for {{rowsToFetch}} and
> {{originalPartitions}}. Alternatively (or also), we may be able to null out
> references in these two collections more aggressively. (ex. Using
> {{ArrayList#set()}} instead of {{get()}} in {{queryProtectedPartitions()}},
> assuming we pass {{toFetch}} as an argument to {{querySourceOnKey()}}.)
> * {{ReplicaFilteringProtection:323}} - We may be able to use
> {{EncodingStats.merge()}} and remove the custom {{stats()}} method.
> * {{DataResolver:111 & 228}} - Cache an instance of
> {{UnaryOperator#identity()}} instead of creating one on the fly.
> * {{ReplicaFilteringProtection:217}} - We may be able to scatter/gather
> rather than serially querying every row that needs to be completed. This
> isn't a clear win perhaps, given it targets the latency of single queries and
> adds some complexity. (Certainly a decent candidate to kick even out of this
> issue.)
> *Documentation and Intelligibility*
> * There are a few places (CHANGES.txt, tracing output in
> {{ReplicaFilteringProtection}}, etc.) where we mention "replica-side
> filtering protection" (which makes it seem like the coordinator doesn't
> filter) rather than "replica filtering protection" (which sounds more like
> what we actually do, which is protect ourselves against incorrect replica
> filtering results). It's a minor fix, but would avoid confusion.
> * The method call chain in {{DataResolver}} might be a bit simpler if we put
> the {{repairedDataTracker}} in {{ResolveContext}}.
> *Testing*
> * I want to bite the bullet and get some basic tests for RFP (including any
> guardrails we might add here) onto the in-JVM dtest framework.
> *Guardrails*
> * As it stands, we don't have a way to enforce an upper bound on the memory
> usage of {{ReplicaFilteringProtection}} which caches row responses from the
> first round of requests. (Remember, these are later used to merged with the
> second round of results to complete the data for filtering.) Operators will
> likely need a way to protect themselves, i.e. simply fail queries if they hit
> a particular threshold rather than GC nodes into oblivion. (Having control
> over limits and page sizes doesn't quite get us there, because stale results
> _expand_ the number of incomplete results we must cache.) The fun question is
> how we do this, with the primary axes being scope (per-query, global, etc.)
> and granularity (per-partition, per-row, per-cell, actual heap usage, etc.).
> My starting disposition on the right trade-off between
> performance/complexity and accuracy is having something along the lines of
> cached rows per query. Prior art suggests this probably makes sense alongside
> things like {{tombstone_failure_threshold}} in {{cassandra.yaml}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]