[ 
https://issues.apache.org/jira/browse/SOLR-13176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754517#comment-16754517
 ] 

Hoss Man commented on SOLR-13176:
---------------------------------


A quick and dirty (non-exhaustive) list of just some of the places that tlog 
replicas were originally being tested but are not currently (typically because 
a randomized "boolean" is now hardcoded) ...

{noformat}
$ find solr/ -name \*.java | grep test | xargs egrep 
'SOLR-12313|waitForInSyncWithLeader|TODO:?\s*tlog'
solr/core/src/test/org/apache/solr/update/TestInPlaceUpdatesDistrib.java:    
return false; // TODO: tlog replicas makes commits take way to long due to what 
is likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/ForceLeaderTest.java:  // TODO: 
SOLR-12313 tlog replicas makes commits take way to long due to what is likely a 
bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/ChaosMonkeyNothingIsSafeWithPullReplicasTest.java:
    return false; // TODO: tlog replicas makes commits take way to long due to 
what is likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/TestTlogReplica.java:@AwaitsFix(bugUrl 
= "https://issues.apache.org/jira/browse/SOLR-12313";)
solr/core/src/test/org/apache/solr/cloud/HttpPartitionTest.java:    return 
false; // TODO: tlog replicas makes commits take way to long due to what is 
likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/ChaosMonkeySafeLeaderWithPullReplicasTest.java:
    return false; // TODO: tlog replicas makes commits take way to long due to 
what is likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/ReplaceNodeTest.java:    // TODO: tlog 
replicas do not work correctly in tests due to fault 
TestInjection#waitForInSyncWithLeader
solr/core/src/test/org/apache/solr/cloud/ChaosMonkeyNothingIsSafeTest.java:    
return false; // TODO: tlog replicas makes commits take way to long due to what 
is likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/api/collections/ShardSplitTest.java:   
 CollectionAdminRequest.Create create = 
CollectionAdminRequest.createCollection(collectionName, "conf1", 1, 2, 0, 2); 
// TODO tlog replicas disabled right now.
solr/core/src/test/org/apache/solr/cloud/HttpPartitionOnCommitTest.java:    
return false; // TODO: tlog replicas makes commits take way to long due to what 
is likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/BasicDistributedZk2Test.java:    
return false; // TODO: tlog replicas makes commits take way to long due to what 
is likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/RecoveryAfterSoftCommitTest.java:    
return false; // TODO: tlog replicas makes commits take way to long due to what 
is likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/BasicDistributedZkTest.java:    return 
false; // TODO: tlog replicas makes commits take way to long due to what is 
likely a bug and it's TestInjection use
solr/core/src/test/org/apache/solr/cloud/TestCloudRecovery.java:    
tlogReplicas = 0; // onlyLeaderIndexes?2:0; TODO: SOLR-12313 tlog replicas 
break tests because
solr/core/src/test/org/apache/solr/cloud/TestCloudRecovery.java:                
          // TestInjection#waitForInSyncWithLeader is broken

{noformat}

> Testing of TLOG Replicas needs to be re-instated, may be hiding bugs
> --------------------------------------------------------------------
>
>                 Key: SOLR-13176
>                 URL: https://issues.apache.org/jira/browse/SOLR-13176
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Priority: Major
>
> As part of mark miller's push to cleanup tests, one change he made as part of 
> his _big__ SOLR-12801 commit (circa Nov2018) was to dissable the randomized 
> use of TLOG replicas in a lot of tests
> His comments at the time were that he suspected a lot of the problems he was 
> seeing was due to a poor implementation of 
> {{TestInjection.waitForInSyncWithLeader()}} (which only comes into play for 
> TLOG replicas) ultimately leading to him creating SOLR-12313.
> But based on some limited experimentation I made w/trying to re-enable TLOG 
> replica randomization in some tests after (essentially) removing 
> {{TestInjection.waitForInSyncWithLeader()}} in SOLR-13168 i'm still seeing a 
> lot of sporadic test failures when TLOG replicas get used... the only change 
> is that instead of "failing slow" because of the stalls introduced by 
> {{TestInjection.waitForInSyncWithLeader()}} they started failing quickly.
> *It's not clear if these failures are because the tests have bugs; or if the 
> tests don't account for the expected behavior of the TLOG replica types in 
> certain situations; or if the code paths being tested have bugs when dealing 
> with TLOG replicas.*
> ----
> Bottom line: As things stand today, TLOG replicas aren't being very 
> thoroughly tested, particularly in edge cases (http partitions, LIR, leader 
> election, mixed used of replica types, etc...)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to