Searching in my jenkins folder for failures of this test (label:jenkins
"FAILED:  org.apache.solr.cloud.OverseerStatusTest.test") 26 emails match.
Searching for all jenkins master builds emails since the first failure
email found above (2 days ago), I see 40 messages.
26 over 40 is not far from the expected 50% failure rate.
I believe the ratio in the graph you sent David (currently at 5.7%) is
averaged over a week, and includes failures from all branches (did some
other stats on jenkins emails that tend to confirm this assumption).

On Sun, Feb 21, 2021 at 10:53 AM Ilan Ginzburg <[email protected]> wrote:

> Yes Marcus this is the commit.
>
> David I would have expected 50% failures, as 50% of the runs use
> distributed updates. I’ll try to understand better as I fix the issue.
>
> Ilan
>
> On Sun 21 Feb 2021 at 06:17, David Smiley <[email protected]> wrote:
>
>> Interesting.  Do you have a guess as to why the failures there are ~5%
>> and not 100% reproducible?
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Sat, Feb 20, 2021 at 6:41 PM Ilan Ginzburg <[email protected]> wrote:
>>
>>> Indeed the issue is due to my changes.
>>>
>>> In OverseerStatusCmd I've skipped some stat collection when running in
>>> distributed cluster state updates mode because I thought these were only
>>> stats related to cluster state updates.
>>> Obviously that was too aggressive and some of the stats are related to
>>> the Collection API.
>>>
>>> I will make sure to skip returning only the stats that are related to
>>> cluster state updater and restore returning collection api stats (when
>>> running in distributed cluster updates mode, otherwise all stats are
>>> returned).
>>>
>>> Tomorrow...
>>>
>>> Ilan
>>>
>>> On Sun, Feb 21, 2021 at 12:22 AM Ilan Ginzburg <[email protected]>
>>> wrote:
>>>
>>>> Thank you David for reporting this.
>>>>
>>>> Seems due to my recent changes. I reproduce the failure locally and
>>>> will look at this tomorrow.
>>>>
>>>> With the distributed cluster state updates i've introduced a
>>>> randomization for using either Overseer based cluster state updates or
>>>> distributed cluster state updates in tests. This failure seems to happen in
>>>> the distributed state update case. I suspect it is due to Overseer
>>>> returning less stats than expected by the test (which is expected: Overseer
>>>> cannot return stats about cluster state updates if it does not handle
>>>> cluster state updates).
>>>>
>>>> The following line in the logs tells that the run is using distributed
>>>> cluster state:
>>>> 972874 INFO  (jetty-launcher-8973-thread-2) [     ]
>>>> o.a.s.c.DistributedClusterStateUpdater Creating
>>>> DistributedClusterStateUpdater with useDistributedStateUpdate=true. Solr
>>>> will be using distributed cluster state updates.
>>>>
>>>> Ilan
>>>>
>>>>
>>>> On Sat, Feb 20, 2021 at 3:00 PM David Smiley <[email protected]>
>>>> wrote:
>>>>
>>>>> I encountered a failure from OverseerStatusTest locally.  According to
>>>>> our test failure trends, this guy only just recently started failing ~4-5%
>>>>> of the time, but previously was fine.  Only master branch.
>>>>>
>>>>>
>>>>> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.OverseerStatusTest.test
>>>>>
>>>>> ~ David Smiley
>>>>> Apache Lucene/Solr Search Developer
>>>>> http://www.linkedin.com/in/davidwsmiley
>>>>>
>>>>

Reply via email to