Ishan/Noble, thanks for taking a look at this. I only just started to look at the cause, so I'm sure you have better context on why this is failing and if it makes sense to still release with this issue.
FYI, I was able to get a successful smoke test run finally, but the fact that it took me over 7 runs. Also, can you confirm how did you run the test? you might be getting lucky with the randomization here. Both me and Tim just commented out the randomization for USE_PER_REPLICA_STATE and hardcoding this value to true consistently got the test to fail. The default (false) did get the test to pass 100% of the times. If you think we can have this fix before the release, it might make more sense to have a single release for users as it wouldn't involve tracking the complexity of what's broken in a released version. I still would like to spend some more time tomorrow before voting on this one, but at least the smoke test is out of the way. I'll try and debug this tomorrow. On Mon, Feb 15, 2021 at 8:40 PM Ishan Chattopadhyaya < ichattopadhy...@gmail.com> wrote: > I tried light beasting the test on branch_8_8: > ant -Dtests.dups=1 -Dtests.iters=5 -Dbeast.iters=5 > -Dtestcase=SolrCloudReportersTest beast > > No failures. > > [beaster] Beast round 1 results: > /home/ishan/code/lucene-solr/solr/build/solr-core/test/1 > [beaster] Beast round 2 results: > /home/ishan/code/lucene-solr/solr/build/solr-core/test/2 > [beaster] Beast round 3 results: > /home/ishan/code/lucene-solr/solr/build/solr-core/test/3 > [beaster] Beast round 4 results: > /home/ishan/code/lucene-solr/solr/build/solr-core/test/4 > [beaster] Beast round 5 results: > /home/ishan/code/lucene-solr/solr/build/solr-core/test/5 > [beaster] Beasting finished Successfully. > > On Tue, Feb 16, 2021 at 10:07 AM Noble Paul <noble.p...@gmail.com> wrote: > >> @Anshum Gupta >> >> I think we should not hold up the release of RC1 because of that failure. >> >> This is a new feature and new features take time to get hardened. >> >> However, We can investigate and fix this anyway. >> >> If required, we can do a 8.8.3 >> >> On Tue, Feb 16, 2021 at 3:10 PM Ishan Chattopadhyaya >> <ichattopadhy...@gmail.com> wrote: >> > >> > Here's my +1 for the RC1. >> > >> > SUCCESS! [0:42:38.936787] >> > >> > On Tue, Feb 16, 2021 at 9:02 AM Ishan Chattopadhyaya < >> ichattopadhy...@gmail.com> wrote: >> >> >> >> Per Replica States is a new feature introduced in 8.8.0. It will >> require a critical bugfix (SOLR-15138) immediately after 8.8.1 (in a 8.8.2 >> release). If this issue is confirmed to be PRS related, then I think we >> should continue with this release and fix PRS in 8.8.2. >> >> >> >> However, if you still want us to investigate and fix this issue now, >> we can take a look. If you have a failing seed handy, please let me know. >> >> >> >> On Tue, Feb 16, 2021 at 8:33 AM Ishan Chattopadhyaya < >> ichattopadhy...@gmail.com> wrote: >> >>> >> >>> Surprising. I'll take a look. >> >>> >> >>> On Tue, 16 Feb, 2021, 7:29 am Anshum Gupta, <ans...@anshumgupta.net> >> wrote: >> >>>> >> >>>> I've unsuccessfully tried getting the smoketester to pass and have >> had 6 fails so far. >> >>>> >> >>>> At this point it seems like SolrCloudReporterTest and >> AutoscalingHistoryTest tests are failing pretty consistently for me. >> >>>> >> >>>> The former is a new failure, and seems to be caused by the >> USE_PER_REPLICA_STATE randomization. >> >>>> >> >>>> Both Tim and me tried running the tests without the randomization >> and defaulting that property to false gets the tests to pass, however it >> seems to be failing every time the value for USE_PER_REPLICA_STATE is set >> to true. >> >>>> >> >>>> I'm not voting -1 yet, as I'm not sure how much this affects the >> build vs the test, but once we have a clearer picture, we might need a fix >> and have to respin this. >> >>>> >> >>>> -Anshum >> >>>> >> >>>> On Sun, Feb 14, 2021 at 8:31 AM Timothy Potter <thelabd...@gmail.com> >> wrote: >> >>>>> >> >>>>> Looks like an extra space got added on the end of the python3 >> command, try this one: >> >>>>> >> >>>>> python3 -u dev-tools/scripts/smokeTestRelease.py >> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC1-rev6a50a0315ac7e4979abb0b530857c7795bb3b928 >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> On Sun, Feb 14, 2021 at 9:26 AM Timothy Potter < >> thelabd...@apache.org> wrote: >> >>>>>> >> >>>>>> Please vote for release candidate 1 for Lucene/Solr 8.8.1 >> >>>>>> >> >>>>>> >> >>>>>> The artifacts can be downloaded from: >> >>>>>> >> >>>>>> >> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC1-rev6a50a0315ac7e4979abb0b530857c7795bb3b928 >> >>>>>> >> >>>>>> >> >>>>>> You can run the smoke tester directly with this command: >> >>>>>> >> >>>>>> >> >>>>>> python3 -u dev-tools/scripts/smokeTestRelease.py \ >> >>>>>> >> >>>>>> >> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC1-rev6a50a0315ac7e4979abb0b530857c7795bb3b928 >> >>>>>> >> >>>>>> >> >>>>>> The vote will be open for at least 72 hours i.e. until 2021-02-17 >> 17:00 UTC. >> >>>>>> >> >>>>>> >> >>>>>> Here is my +1 ~ SUCCESS! [0:50:06.728441] >> >>>>>> >> >>>>>> >> >>>>>> In addition to the smoke test, I built a Docker image from >> solr-8.8.1.tgz locally and verified: >> >>>>>> >> >>>>>> >> >>>>>> a. A rolling upgrade of a 3-node 8.7.0 cluster to the 8.8.1 RC >> completes successfully w/o any NPEs or weirdness with leader election / >> recoveries. >> >>>>>> >> >>>>>> >> >>>>>> b. The base_url property is stored in replica state after the >> upgrade >> >>>>>> >> >>>>>> >> >>>>>> c. A basic client application built with SolrJ 8.7.0 can load >> cluster state info directly from ZK and query the 8.8.1 RC1 servers. >> >>>>>> >> >>>>>> >> >>>>>> d. Same client app built with SolrJ 8.8.0 works as well. >> >>>>>> >> >>>>>> >> >>>>>> As this bug-fix release is primarily needed to address a SolrJ >> back-compat break (SOLR-15145) and unfortunately our smoke tester framework >> does not test for backcompat of older SolrJ against the RC, I ask others to >> please test rolling upgrades of servers (ideally multi-node clusters) >> running pre-8.8.0 to this RC if possible. Also, please try client >> applications that are using an older SolrJ, esp. those that load cluster >> state directly from ZK. >> >>>>>> >> >>>>>> >> >>>>>> Best regards, >> >>>>>> >> >>>>>> Tim >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Anshum Gupta >> >> >> >> -- >> ----------------------------------------------------- >> Noble Paul >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> -- Anshum Gupta