Hi Gus,
I encountered those failures on Fedora 26. It seems everything is fine on
Fedora 25, but there are bunch of failures on Fedora 26. Most of these
failures are Solr losing ZK connections and hence timing out.
Initially, I thought these are related to the kernel version, but now I
think this is distribution specific. What is a bit baffling to me is that I
remember these tests running well with Fedora 26 about a month back (so,
maybe some latest update broke it?). I'm looking into what could be the
underlying reason.

Interesting that you could reproduce these on Ubuntu 17.04. I'll take a
look at that distro version as well. I'm also using the AMD Threadripper
1950X these days, and I see that -Dtests.jvms=24 gives me the best overall
times.
Regards,
Ishan


On Mon, Oct 16, 2017 at 1:36 AM, Gus Heck <[email protected]> wrote:

> @Ishan, re failures.. I'm seeing very common test failures on a Ubuntu
> 17.04 box that I built very recently, but that is apparently lower version
> than your success. (4.10.0-37-generic x86_64)... would be interested to
> know what distro you are using... and what the diff between the versions
> you used were. Failures were more common with things cranked up to 30
> processors (success 1 in 10 times, note: box has a 32 thread AMD
> processor). Failures are less common with "auto" which yeilds 4 (about
> 50/50 chance of success), am now working on figuring out how common
> failures are with it tuned down to 1 thread (but that's very slow).
>
> On Sun, Oct 15, 2017 at 3:01 PM, Ishan Chattopadhyaya <
> [email protected]> wrote:
>
>> Thanks Steve, it was indeed the problem!
>>
>> On Sun, Oct 15, 2017 at 8:42 PM, Ishan Chattopadhyaya <
>> [email protected]> wrote:
>>
>>> Thanks a lot, Steve! I'll take a look :-)
>>>
>>> On Sun, Oct 15, 2017 at 8:37 PM, Steve Rowe <[email protected]> wrote:
>>>
>>>> Hi Ishan,
>>>>
>>>> (I see you pinged me on #solr-dev IRC, but I was AFK for a while,
>>>> sorry.)
>>>>
>>>> I think the change I made to buildAndPushRelease.py, which fixed a
>>>> problem I had with building the 7.0.1 RC that sounds suspiciously like what
>>>> you’re encountering, might help?  I didn’t commit to branch_6_6, but here’s
>>>> the branch_7_0 commit: <https://git1-us-west.apache.o
>>>> rg/repos/asf?p=lucene-solr.git;a=commit;h=8d6c3889>
>>>>
>>>> Here’s the branch_6_6 version:
>>>>
>>>>   result = p.poll()
>>>>   if result is not None:
>>>>     msg = '    FAILED: %s [see log %s]' % (command, LOG)
>>>>
>>>> Null is returned by poll() to indicate that the process has not
>>>> terminated.  So what’s AFAICT happening to you is that the process *is*
>>>> terminating in time, and is returning 0 (for success), which is not Null,
>>>> which triggers failure.  This is wrong.  My patch switches this code to use
>>>> wait() instead of poll():
>>>>
>>>>   try:
>>>>     result = p.wait(timeout=120)
>>>>     if result != 0:
>>>>       msg = '    FAILED: %s [see log %s]' % (command, LOG)
>>>>       print(msg)
>>>>       raise RuntimeError(msg)
>>>>   except TimeoutExpired:
>>>>     msg = '    FAILED: %s [timed out after 2 minutes; see log %s]' %
>>>> (command, LOG)
>>>>
>>>>
>>>> --
>>>> Steve
>>>> www.lucidworks.com
>>>>
>>>> > On Oct 15, 2017, at 10:45 AM, Ishan Chattopadhyaya <
>>>> [email protected]> wrote:
>>>> >
>>>> > Update on the RC: I'm trying to build one for some time now. The
>>>> latest situation is that all the steps seem to be going well, but still the
>>>> script fails: https://gist.github.com/chatma
>>>> n/fa307c3e8253d2014d0e7bb381328396
>>>> >
>>>> > Looking into what could be going wrong. Any help is most welcome.
>>>> >
>>>> > @Shalin, I remember you mentioned that you found a way to build the
>>>> artifacts separately and signing them separately. Can you please share how
>>>> to do so? It will save me a lot of time; currently each of my attempts is
>>>> building artifacts from scratch.
>>>> >
>>>> > Thanks,
>>>> > Ishan
>>>> >
>>>> > On Sat, Oct 14, 2017 at 11:09 PM, Erick Erickson <
>>>> [email protected]> wrote:
>>>> > Thanks! I ran precommit and test after the commit and all's well....
>>>> >
>>>> > On Sat, Oct 14, 2017 at 8:27 AM, Ishan Chattopadhyaya <
>>>> [email protected]> wrote:
>>>> > No problem, I'll pick up your commit. :-)
>>>> >
>>>> > On Sat, Oct 14, 2017 at 8:51 PM, Erick Erickson <
>>>> [email protected]> wrote:
>>>> > Committed now.
>>>> >
>>>> >
>>>> >
>>>> > On Sat, Oct 14, 2017 at 8:19 AM, Erick Erickson <
>>>> [email protected]> wrote:
>>>> > Michael: Good catch. Have I mentioned lately that Git and I don't get
>>>> along? Apparently I was in some weird state when I tried to push.
>>>> >
>>>> > Ishan: Many apologies, but I'll have to push again, is it too late to
>>>> re-spin?
>>>> >
>>>> > On Sat, Oct 14, 2017 at 7:56 AM, Ishan Chattopadhyaya <
>>>> [email protected]> wrote:
>>>> > Here are the logs of two failed runs, FYI.
>>>> > http://textsearch.io/tests.log.gz (kernel: 4.13.5-200.fc26.x86_64)
>>>> > http://textsearch.io/tests2.log.gz (kernel: 4.13.5-200.fc26.x86_64)
>>>> >
>>>> > On Sat, Oct 14, 2017 at 8:15 PM, Ishan Chattopadhyaya <
>>>> [email protected]> wrote:
>>>> > FYI, I've been struggling to run tests for past 4-5 hours. About
>>>> 10-15 of them failed on every run; I tried all the branches, variety of
>>>> different machines (Intel i7 Haswell-E, Ryzen 1700, Threadripper 1950X). My
>>>> JDK version on all of these are 8u144.
>>>> >
>>>> > Finally, figured out that all my machines had the latest
>>>> 4.12.14-300.fc26.x86_64 or 4.13.5-200.fc26.x86_64 kernels. When I
>>>> downgraded the kernel to 4.11.6-201.fc25.x86_64, the tests started running
>>>> as usual. Now, I'll try to build the RC for 6.6.2 on this kernel. Is this a
>>>> known issue?
>>>> >
>>>> > On Sat, Oct 14, 2017 at 6:26 AM, Erick Erickson <
>>>> [email protected]> wrote:
>>>> > Done both for 6.6 and 6x
>>>> >
>>>> > On Fri, Oct 13, 2017 at 5:16 PM, Ishan Chattopadhyaya <
>>>> [email protected]> wrote:
>>>> > Sure Erick, please go ahead.
>>>> > I'll start the release later today.
>>>> > Thanks,
>>>> > Ishan
>>>> >
>>>> > On Sat, Oct 14, 2017 at 5:44 AM, Erick Erickson <
>>>> [email protected]> wrote:
>>>> > Ishan:
>>>> >
>>>> > I have 11297 ready to rock-n-roll, it's just a matter of pushing it.
>>>> Give me a few.
>>>> >
>>>> > The thing I'm not clear on is what to do with CHANGES.txt. Currently
>>>> it's in 7.0.1 and 7.1.
>>>> >
>>>> > I propose adding a 6.6.2 section to 6x and including it there and
>>>> leaving it in the 7.0.1 and 7.1 sections of master.
>>>> >
>>>> > I'll do it that way, you can change it if you want unless I hear back
>>>> from you sooner.
>>>> >
>>>> > Erick
>>>> >
>>>> > On Fri, Oct 13, 2017 at 4:59 PM, Allison, Timothy B. <
>>>> [email protected]> wrote:
>>>> > Sounds good.  Thank you!
>>>> >
>>>> >
>>>> >
>>>> > From: Ishan Chattopadhyaya [mailto:[email protected]]
>>>> > Sent: Friday, October 13, 2017 5:25 PM
>>>> > To: [email protected]
>>>> > Subject: Re: 6.6.2 Release
>>>> >
>>>> >
>>>> >
>>>> > > Any chance we could get SOLR-11450 in?  I understand if the answer
>>>> is no. 😊
>>>> >
>>>> > Currently, I want to have this release out as soon as possible so as
>>>> to mitigate the risk exposure of the security vulnerability. Since this is
>>>> not committed yet, I'd vote for leaving this out and possibly having it
>>>> included in a later release, if needed.
>>>> >
>>>> > +1 to SOLR-11297.
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Sat, Oct 14, 2017 at 2:32 AM, David Smiley <
>>>> [email protected]> wrote:
>>>> >
>>>> > Suggested criteria for bug-fix release issues:
>>>> >
>>>> > * fixes a bug :-)     and doesn't harm backwards-compatibility in the
>>>> process
>>>> >
>>>> > * helps users upgrade to later versions
>>>> >
>>>> > * documentation
>>>> >
>>>> >
>>>> >
>>>> > +1 to SOLR-11297
>>>> >
>>>> >
>>>> >
>>>> > I'm not sure on SOLR-11450.  Seems it might introduce a back-compat
>>>> issue?
>>>> >
>>>> >
>>>> >
>>>> > On Fri, Oct 13, 2017 at 4:40 PM Erick Erickson <
>>>> [email protected]> wrote:
>>>> >
>>>> > I'd also like to get SOLR-11297 in if there are no objections. Ditto
>>>> if the answer is no....
>>>> >
>>>> >
>>>> >
>>>> > It's quite a safe fix though.
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Fri, Oct 13, 2017 at 1:26 PM, Allison, Timothy B. <
>>>> [email protected]> wrote:
>>>> >
>>>> > Any chance we could get SOLR-11450 in?  I understand if the answer is
>>>> no. 😊
>>>> >
>>>> >
>>>> >
>>>> > Thank you!
>>>> >
>>>> >
>>>> >
>>>> > From: Ishan Chattopadhyaya [mailto:[email protected]]
>>>> > Sent: Friday, October 13, 2017 4:23 PM
>>>> > To: [email protected]
>>>> > Subject: 6.6.2 Release
>>>> >
>>>> >
>>>> >
>>>> > Hi,
>>>> >
>>>> > In light of [0], we need a 6.6.2 release as soon as possible.
>>>> >
>>>> > I'd like to volunteer to RM for this release, unless someone else
>>>> wants to do so or has an objection.
>>>> >
>>>> > Regards,
>>>> >
>>>> > Ishan
>>>> >
>>>> >
>>>> >
>>>> > [0] - https://lucene.apache.org/solr/news.html#12-october-2017-ple
>>>> ase-secure-your-apache-solr-servers-since-a-zero-day-exploit
>>>> -has-been-reported-on-a-public-mailing-list
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> >
>>>> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>>>> >
>>>> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>>> http://www.solrenterprisesearchserver.com
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>
>>
>
>
> --
> http://www.the111shift.com
>

Reply via email to