[ https://issues.apache.org/jira/browse/CASSANDRA-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168200#comment-17168200 ]
David Capwell edited comment on CASSANDRA-15907 at 7/30/20, 8:35 PM: --------------------------------------------------------------------- FYI org.apache.cassandra.config.DatabaseDescriptorRefTest is failing https://app.circleci.com/pipelines/github/dcapwell/cassandra/379/workflows/83c3e1f6-3279-4426-8af8-a02926b10774/jobs/1975 {code} git checkout trunk git pull --rebase upstream trunk ant realclean && ant && ant generate-idea-files ant testclasslist -Dtest.classlistfile=<(echo org/apache/cassandra/config/DatabaseDescriptorRefTest.java) -Dtest.classlistprefix=unit ... [junit-timeout] Testcase: testDatabaseDescriptorRef(org.apache.cassandra.config.DatabaseDescriptorRefTest): FAILED [junit-timeout] null [junit-timeout] junit.framework.AssertionFailedError [junit-timeout] at org.apache.cassandra.config.DatabaseDescriptorRefTest.checkViolations(DatabaseDescriptorRefTest.java:303) [junit-timeout] at org.apache.cassandra.config.DatabaseDescriptorRefTest.testDatabaseDescriptorRef(DatabaseDescriptorRefTest.java:287) [junit-timeout] [junit-timeout] [junit-timeout] Test org.apache.cassandra.config.DatabaseDescriptorRefTest FAILED [junitreport] Processing /Users/davidcapwell/src/github/apache/cassandra-trunk/build/test/TESTS-TestSuites.xml to /var/folders/cm/08cddl2s25j7fq3jdb76gh4r0000gn/T/null2013776906 [junitreport] Loading stylesheet jar:file:/usr/local/Cellar/ant/1.10.7/libexec/lib/ant-junit.jar!/org/apache/tools/ant/taskdefs/optional/junit/xsl/junit-frames.xsl [junitreport] Transform time: 277ms [junitreport] Deleting: /var/folders/cm/08cddl2s25j7fq3jdb76gh4r0000gn/T/null2013776906 BUILD FAILED /Users/davidcapwell/src/github/apache/cassandra-trunk/build.xml:1981: The following error occurred while executing this line: /Users/davidcapwell/src/github/apache/cassandra-trunk/build.xml:1871: Some test(s) failed. {code} looks like clear snapshot as well, not sure why it passed in circle ci {code} ant testclasslist -Dtest.classlistfile=<(echo org/apache/cassandra/tools/ClearSnapshotTest.java) -Dtest.classlistprefix=unit ... [junit-timeout] Testsuite: org.apache.cassandra.tools.ClearSnapshotTest Tests run: 4, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 8.175 sec [junit-timeout] [junit-timeout] Testcase: testClearSnapshot_RemoveMultiple(org.apache.cassandra.tools.ClearSnapshotTest): FAILED [junit-timeout] null [junit-timeout] junit.framework.AssertionFailedError [junit-timeout] at org.apache.cassandra.tools.ToolRunner.assertEmptyStdErr(ToolRunner.java:338) [junit-timeout] at org.apache.cassandra.tools.ToolRunner.waitAndAssertOnCleanExit(ToolRunner.java:333) [junit-timeout] at org.apache.cassandra.tools.ClearSnapshotTest.testClearSnapshot_RemoveMultiple(ClearSnapshotTest.java:91) [junit-timeout] [junit-timeout] [junit-timeout] Testcase: testClearSnapshot_NoArgs(org.apache.cassandra.tools.ClearSnapshotTest): FAILED [junit-timeout] null [junit-timeout] junit.framework.AssertionFailedError [junit-timeout] at org.apache.cassandra.tools.ToolRunner.assertEmptyStdErr(ToolRunner.java:338) [junit-timeout] at org.apache.cassandra.tools.ToolRunner.waitAndAssertOnCleanExit(ToolRunner.java:333) [junit-timeout] at org.apache.cassandra.tools.ClearSnapshotTest.testClearSnapshot_NoArgs(ClearSnapshotTest.java:61) [junit-timeout] [junit-timeout] [junit-timeout] Testcase: testClearSnapshot_RemoveByName(org.apache.cassandra.tools.ClearSnapshotTest): FAILED [junit-timeout] null [junit-timeout] junit.framework.AssertionFailedError [junit-timeout] at org.apache.cassandra.tools.ToolRunner.assertEmptyStdErr(ToolRunner.java:338) [junit-timeout] at org.apache.cassandra.tools.ToolRunner.waitAndAssertOnCleanExit(ToolRunner.java:333) [junit-timeout] at org.apache.cassandra.tools.ClearSnapshotTest.testClearSnapshot_RemoveByName(ClearSnapshotTest.java:75) [junit-timeout] [junit-timeout] [junit-timeout] Test org.apache.cassandra.tools.ClearSnapshotTest FAILED {code} was (Author: dcapwell): FYI org.apache.cassandra.config.DatabaseDescriptorRefTest is failing https://app.circleci.com/pipelines/github/dcapwell/cassandra/379/workflows/83c3e1f6-3279-4426-8af8-a02926b10774/jobs/1975 {code} git checkout trunk git pull --rebase upstream trunk ant realclean && ant && ant generate-idea-files ant testclasslist -Dtest.classlistfile=<(echo org/apache/cassandra/config/DatabaseDescriptorRefTest.java) -Dtest.classlistprefix=unit ... [junit-timeout] Testcase: testDatabaseDescriptorRef(org.apache.cassandra.config.DatabaseDescriptorRefTest): FAILED [junit-timeout] null [junit-timeout] junit.framework.AssertionFailedError [junit-timeout] at org.apache.cassandra.config.DatabaseDescriptorRefTest.checkViolations(DatabaseDescriptorRefTest.java:303) [junit-timeout] at org.apache.cassandra.config.DatabaseDescriptorRefTest.testDatabaseDescriptorRef(DatabaseDescriptorRefTest.java:287) [junit-timeout] [junit-timeout] [junit-timeout] Test org.apache.cassandra.config.DatabaseDescriptorRefTest FAILED [junitreport] Processing /Users/davidcapwell/src/github/apache/cassandra-trunk/build/test/TESTS-TestSuites.xml to /var/folders/cm/08cddl2s25j7fq3jdb76gh4r0000gn/T/null2013776906 [junitreport] Loading stylesheet jar:file:/usr/local/Cellar/ant/1.10.7/libexec/lib/ant-junit.jar!/org/apache/tools/ant/taskdefs/optional/junit/xsl/junit-frames.xsl [junitreport] Transform time: 277ms [junitreport] Deleting: /var/folders/cm/08cddl2s25j7fq3jdb76gh4r0000gn/T/null2013776906 BUILD FAILED /Users/davidcapwell/src/github/apache/cassandra-trunk/build.xml:1981: The following error occurred while executing this line: /Users/davidcapwell/src/github/apache/cassandra-trunk/build.xml:1871: Some test(s) failed. {code} > Operational Improvements & Hardening for Replica Filtering Protection > --------------------------------------------------------------------- > > Key: CASSANDRA-15907 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15907 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Coordination, Feature/2i Index > Reporter: Caleb Rackliffe > Assignee: Caleb Rackliffe > Priority: Normal > Labels: 2i, memory > Fix For: 3.0.22, 3.11.8, 4.0-beta2 > > Time Spent: 8h 50m > Remaining Estimate: 0h > > CASSANDRA-8272 uses additional space on the heap to ensure correctness for 2i > and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a > few things we should follow up on, however, to make life a bit easier for > operators and generally de-risk usage: > (Note: Line numbers are based on {{trunk}} as of > {{3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb}}.) > *Minor Optimizations* > * {{ReplicaFilteringProtection:114}} - Given we size them up-front, we may be > able to use simple arrays instead of lists for {{rowsToFetch}} and > {{originalPartitions}}. Alternatively (or also), we may be able to null out > references in these two collections more aggressively. (ex. Using > {{ArrayList#set()}} instead of {{get()}} in {{queryProtectedPartitions()}}, > assuming we pass {{toFetch}} as an argument to {{querySourceOnKey()}}.) > * {{ReplicaFilteringProtection:323}} - We may be able to use > {{EncodingStats.merge()}} and remove the custom {{stats()}} method. > * {{DataResolver:111 & 228}} - Cache an instance of > {{UnaryOperator#identity()}} instead of creating one on the fly. > * {{ReplicaFilteringProtection:217}} - We may be able to scatter/gather > rather than serially querying every row that needs to be completed. This > isn't a clear win perhaps, given it targets the latency of single queries and > adds some complexity. (Certainly a decent candidate to kick even out of this > issue.) > *Documentation and Intelligibility* > * There are a few places (CHANGES.txt, tracing output in > {{ReplicaFilteringProtection}}, etc.) where we mention "replica-side > filtering protection" (which makes it seem like the coordinator doesn't > filter) rather than "replica filtering protection" (which sounds more like > what we actually do, which is protect ourselves against incorrect replica > filtering results). It's a minor fix, but would avoid confusion. > * The method call chain in {{DataResolver}} might be a bit simpler if we put > the {{repairedDataTracker}} in {{ResolveContext}}. > *Testing* > * I want to bite the bullet and get some basic tests for RFP (including any > guardrails we might add here) onto the in-JVM dtest framework. > *Guardrails* > * As it stands, we don't have a way to enforce an upper bound on the memory > usage of {{ReplicaFilteringProtection}} which caches row responses from the > first round of requests. (Remember, these are later used to merged with the > second round of results to complete the data for filtering.) Operators will > likely need a way to protect themselves, i.e. simply fail queries if they hit > a particular threshold rather than GC nodes into oblivion. (Having control > over limits and page sizes doesn't quite get us there, because stale results > _expand_ the number of incomplete results we must cache.) The fun question is > how we do this, with the primary axes being scope (per-query, global, etc.) > and granularity (per-partition, per-row, per-cell, actual heap usage, etc.). > My starting disposition on the right trade-off between > performance/complexity and accuracy is having something along the lines of > cached rows per query. Prior art suggests this probably makes sense alongside > things like {{tombstone_failure_threshold}} in {{cassandra.yaml}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org