Hoss Man created SOLR-13176:
-------------------------------
Summary: Testing of TLOG Replicas needs to be re-instated, may be
hiding bugs
Key: SOLR-13176
URL: https://issues.apache.org/jira/browse/SOLR-13176
Project: Solr
Issue Type: Sub-task
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Hoss Man
As part of mark miller's push to cleanup tests, one change he made as part of
his _big__ SOLR-12801 commit (circa Nov2018) was to dissable the randomized use
of TLOG replicas in a lot of tests
His comments at the time were that he suspected a lot of the problems he was
seeing was due to a poor implementation of
{{TestInjection.waitForInSyncWithLeader()}} (which only comes into play for
TLOG replicas) ultimately leading to him creating SOLR-12313.
But based on some limited experimentation I made w/trying to re-enable TLOG
replica randomization in some tests after (essentially) removing
{{TestInjection.waitForInSyncWithLeader()}} in SOLR-13168 i'm still seeing a
lot of sporadic test failures when TLOG replicas get used... the only change is
that instead of "failing slow" because of the stalls introduced by
{{TestInjection.waitForInSyncWithLeader()}} they started failing quickly.
*It's not clear if these failures are because the tests have bugs; or if the
tests don't account for the expected behavior of the TLOG replica types in
certain situations; or if the code paths being tested have bugs when dealing
with TLOG replicas.*
----
Bottom line: As things stand today, TLOG replicas aren't being very thoroughly
tested, particularly in edge cases (http partitions, LIR, leader election,
mixed used of replica types, etc...)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]