[jira] [Commented] (SOLR-13176) Testing of TLOG Replicas needs to be re-instated, may be hiding bugs

JIRA Wed, 30 Jan 2019 16:11:44 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-13176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756715#comment-16756715
 ]


Tomás Fernández Löbbe commented on SOLR-13176:
----------------------------------------------

bq. these test that were randomizing TLOG replicas were still garunteed fail 
unless assertions were enabled
Right. I guess nobody noticed this before since it's unlikely tests are ran 
without assertions, given the imposed checks.

bq. It sounds like we either need new variants...
Both approaches sound good to me. I'd incline more for "make the test invoke 
some waitForAllTlogReplicasInSyncWithLeaders..."  because it doesn't pollute 
the real code with test stuff. It does mean that tests need to explicitly 
handle the TLOG case (which is I think something Dat wanted to avoid in the 
first iteration of the code) but sounds like there is no way out of that anyway.

> Testing of TLOG Replicas needs to be re-instated, may be hiding bugs
> --------------------------------------------------------------------
>
>                 Key: SOLR-13176
>                 URL: https://issues.apache.org/jira/browse/SOLR-13176
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Priority: Major
>
> As part of mark miller's push to cleanup tests, one change he made as part of 
> his _big__ SOLR-12801 commit (circa Nov2018) was to dissable the randomized 
> use of TLOG replicas in a lot of tests
> His comments at the time were that he suspected a lot of the problems he was 
> seeing was due to a poor implementation of 
> {{TestInjection.waitForInSyncWithLeader()}} (which only comes into play for 
> TLOG replicas) ultimately leading to him creating SOLR-12313.
> But based on some limited experimentation I made w/trying to re-enable TLOG 
> replica randomization in some tests after (essentially) removing 
> {{TestInjection.waitForInSyncWithLeader()}} in SOLR-13168 i'm still seeing a 
> lot of sporadic test failures when TLOG replicas get used... the only change 
> is that instead of "failing slow" because of the stalls introduced by 
> {{TestInjection.waitForInSyncWithLeader()}} they started failing quickly.
> *It's not clear if these failures are because the tests have bugs; or if the 
> tests don't account for the expected behavior of the TLOG replica types in 
> certain situations; or if the code paths being tested have bugs when dealing 
> with TLOG replicas.*
> ----
> Bottom line: As things stand today, TLOG replicas aren't being very 
> thoroughly tested, particularly in edge cases (http partitions, LIR, leader 
> election, mixed used of replica types, etc...)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-13176) Testing of TLOG Replicas needs to be re-instated, may be hiding bugs

Reply via email to