[ 
https://issues.apache.org/jira/browse/SOLR-13176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754596#comment-16754596
 ] 

Tomás Fernández Löbbe commented on SOLR-13176:
----------------------------------------------

Much of the TLOG testing was added for the "onlyLeaderIndexes" changes, so 
[~caomanhdat] can probably comment on it more. I'm not sure I follow exactly, 
but if the logic in {{waitForInSyncWithLeader}} is commented out and just 
returns immediately I expect lots of tests to fail, something like:
 * add document
 * commit
 * search

won't work. All those other tests were not modified to handle TLOG replicas, 
they assume the same behavior of NRT. (Except for {{TestTlogReplica}} and maybe 
{{ChaosMonkeySafeLeaderWithPullReplicasTest}})

> Testing of TLOG Replicas needs to be re-instated, may be hiding bugs
> --------------------------------------------------------------------
>
>                 Key: SOLR-13176
>                 URL: https://issues.apache.org/jira/browse/SOLR-13176
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Priority: Major
>
> As part of mark miller's push to cleanup tests, one change he made as part of 
> his _big__ SOLR-12801 commit (circa Nov2018) was to dissable the randomized 
> use of TLOG replicas in a lot of tests
> His comments at the time were that he suspected a lot of the problems he was 
> seeing was due to a poor implementation of 
> {{TestInjection.waitForInSyncWithLeader()}} (which only comes into play for 
> TLOG replicas) ultimately leading to him creating SOLR-12313.
> But based on some limited experimentation I made w/trying to re-enable TLOG 
> replica randomization in some tests after (essentially) removing 
> {{TestInjection.waitForInSyncWithLeader()}} in SOLR-13168 i'm still seeing a 
> lot of sporadic test failures when TLOG replicas get used... the only change 
> is that instead of "failing slow" because of the stalls introduced by 
> {{TestInjection.waitForInSyncWithLeader()}} they started failing quickly.
> *It's not clear if these failures are because the tests have bugs; or if the 
> tests don't account for the expected behavior of the TLOG replica types in 
> certain situations; or if the code paths being tested have bugs when dealing 
> with TLOG replicas.*
> ----
> Bottom line: As things stand today, TLOG replicas aren't being very 
> thoroughly tested, particularly in edge cases (http partitions, LIR, leader 
> election, mixed used of replica types, etc...)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to