[
https://issues.apache.org/jira/browse/SOLR-13176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754517#comment-16754517
]
Hoss Man commented on SOLR-13176:
---------------------------------
A quick and dirty (non-exhaustive) list of just some of the places that tlog
replicas were originally being tested but are not currently (typically because
a randomized "boolean" is now hardcoded) ...
{noformat}
$ find solr/ -name \*.java | grep test | xargs egrep
'SOLR-12313|waitForInSyncWithLeader|TODO:?\s*tlog'
solr/core/src/test/org/apache/solr/update/TestInPlaceUpdatesDistrib.java:
return false; // TODO: tlog replicas makes commits take way to long due to what
is likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/ForceLeaderTest.java: // TODO:
SOLR-12313 tlog replicas makes commits take way to long due to what is likely a
bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/ChaosMonkeyNothingIsSafeWithPullReplicasTest.java:
return false; // TODO: tlog replicas makes commits take way to long due to
what is likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/TestTlogReplica.java:@AwaitsFix(bugUrl
= "https://issues.apache.org/jira/browse/SOLR-12313")
solr/core/src/test/org/apache/solr/cloud/HttpPartitionTest.java: return
false; // TODO: tlog replicas makes commits take way to long due to what is
likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/ChaosMonkeySafeLeaderWithPullReplicasTest.java:
return false; // TODO: tlog replicas makes commits take way to long due to
what is likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/ReplaceNodeTest.java: // TODO: tlog
replicas do not work correctly in tests due to fault
TestInjection#waitForInSyncWithLeader
solr/core/src/test/org/apache/solr/cloud/ChaosMonkeyNothingIsSafeTest.java:
return false; // TODO: tlog replicas makes commits take way to long due to what
is likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/api/collections/ShardSplitTest.java:
CollectionAdminRequest.Create create =
CollectionAdminRequest.createCollection(collectionName, "conf1", 1, 2, 0, 2);
// TODO tlog replicas disabled right now.
solr/core/src/test/org/apache/solr/cloud/HttpPartitionOnCommitTest.java:
return false; // TODO: tlog replicas makes commits take way to long due to what
is likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/BasicDistributedZk2Test.java:
return false; // TODO: tlog replicas makes commits take way to long due to what
is likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/RecoveryAfterSoftCommitTest.java:
return false; // TODO: tlog replicas makes commits take way to long due to what
is likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/BasicDistributedZkTest.java: return
false; // TODO: tlog replicas makes commits take way to long due to what is
likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/TestCloudRecovery.java:
tlogReplicas = 0; // onlyLeaderIndexes?2:0; TODO: SOLR-12313 tlog replicas
break tests because
solr/core/src/test/org/apache/solr/cloud/TestCloudRecovery.java:
// TestInjection#waitForInSyncWithLeader is broken
{noformat}
> Testing of TLOG Replicas needs to be re-instated, may be hiding bugs
> --------------------------------------------------------------------
>
> Key: SOLR-13176
> URL: https://issues.apache.org/jira/browse/SOLR-13176
> Project: Solr
> Issue Type: Sub-task
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Hoss Man
> Priority: Major
>
> As part of mark miller's push to cleanup tests, one change he made as part of
> his _big__ SOLR-12801 commit (circa Nov2018) was to dissable the randomized
> use of TLOG replicas in a lot of tests
> His comments at the time were that he suspected a lot of the problems he was
> seeing was due to a poor implementation of
> {{TestInjection.waitForInSyncWithLeader()}} (which only comes into play for
> TLOG replicas) ultimately leading to him creating SOLR-12313.
> But based on some limited experimentation I made w/trying to re-enable TLOG
> replica randomization in some tests after (essentially) removing
> {{TestInjection.waitForInSyncWithLeader()}} in SOLR-13168 i'm still seeing a
> lot of sporadic test failures when TLOG replicas get used... the only change
> is that instead of "failing slow" because of the stalls introduced by
> {{TestInjection.waitForInSyncWithLeader()}} they started failing quickly.
> *It's not clear if these failures are because the tests have bugs; or if the
> tests don't account for the expected behavior of the TLOG replica types in
> certain situations; or if the code paths being tested have bugs when dealing
> with TLOG replicas.*
> ----
> Bottom line: As things stand today, TLOG replicas aren't being very
> thoroughly tested, particularly in edge cases (http partitions, LIR, leader
> election, mixed used of replica types, etc...)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]