Ah; that makes total sense; thanks.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Feb 21, 2021 at 12:06 PM Ilan Ginzburg <ilans...@gmail.com> wrote:

> Searching in my jenkins folder for failures of this test (label:jenkins
> "FAILED:  org.apache.solr.cloud.OverseerStatusTest.test") 26 emails match.
> Searching for all jenkins master builds emails since the first failure
> email found above (2 days ago), I see 40 messages.
> 26 over 40 is not far from the expected 50% failure rate.
> I believe the ratio in the graph you sent David (currently at 5.7%) is
> averaged over a week, and includes failures from all branches (did some
> other stats on jenkins emails that tend to confirm this assumption).
>
> On Sun, Feb 21, 2021 at 10:53 AM Ilan Ginzburg <ilans...@gmail.com> wrote:
>
>> Yes Marcus this is the commit.
>>
>> David I would have expected 50% failures, as 50% of the runs use
>> distributed updates. I’ll try to understand better as I fix the issue.
>>
>> Ilan
>>
>> On Sun 21 Feb 2021 at 06:17, David Smiley <dsmi...@apache.org> wrote:
>>
>>> Interesting.  Do you have a guess as to why the failures there are ~5%
>>> and not 100% reproducible?
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Sat, Feb 20, 2021 at 6:41 PM Ilan Ginzburg <ilans...@gmail.com>
>>> wrote:
>>>
>>>> Indeed the issue is due to my changes.
>>>>
>>>> In OverseerStatusCmd I've skipped some stat collection when running in
>>>> distributed cluster state updates mode because I thought these were only
>>>> stats related to cluster state updates.
>>>> Obviously that was too aggressive and some of the stats are related to
>>>> the Collection API.
>>>>
>>>> I will make sure to skip returning only the stats that are related to
>>>> cluster state updater and restore returning collection api stats (when
>>>> running in distributed cluster updates mode, otherwise all stats are
>>>> returned).
>>>>
>>>> Tomorrow...
>>>>
>>>> Ilan
>>>>
>>>> On Sun, Feb 21, 2021 at 12:22 AM Ilan Ginzburg <ilans...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thank you David for reporting this.
>>>>>
>>>>> Seems due to my recent changes. I reproduce the failure locally and
>>>>> will look at this tomorrow.
>>>>>
>>>>> With the distributed cluster state updates i've introduced a
>>>>> randomization for using either Overseer based cluster state updates or
>>>>> distributed cluster state updates in tests. This failure seems to happen 
>>>>> in
>>>>> the distributed state update case. I suspect it is due to Overseer
>>>>> returning less stats than expected by the test (which is expected: 
>>>>> Overseer
>>>>> cannot return stats about cluster state updates if it does not handle
>>>>> cluster state updates).
>>>>>
>>>>> The following line in the logs tells that the run is using distributed
>>>>> cluster state:
>>>>> 972874 INFO  (jetty-launcher-8973-thread-2) [     ]
>>>>> o.a.s.c.DistributedClusterStateUpdater Creating
>>>>> DistributedClusterStateUpdater with useDistributedStateUpdate=true. Solr
>>>>> will be using distributed cluster state updates.
>>>>>
>>>>> Ilan
>>>>>
>>>>>
>>>>> On Sat, Feb 20, 2021 at 3:00 PM David Smiley <dsmi...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> I encountered a failure from OverseerStatusTest locally.  According
>>>>>> to our test failure trends, this guy only just recently started failing
>>>>>> ~4-5% of the time, but previously was fine.  Only master branch.
>>>>>>
>>>>>>
>>>>>> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.OverseerStatusTest.test
>>>>>>
>>>>>> ~ David Smiley
>>>>>> Apache Lucene/Solr Search Developer
>>>>>> http://www.linkedin.com/in/davidwsmiley
>>>>>>
>>>>>

Reply via email to