I tend to agree with the sentiment expressed here. If failing tests are disabled, I have to wonder about the value of the tests in the first place.
On May 21, 2017 4:17:47 AM GMT+02:00, "Eli Stevens (Gmail)" <[email protected]> wrote: >Please take this as a single data point from an end user: > >This approach will probably result in my company declining to upgrade >to CouchDB 2.1, and instead waiting for 2.2 in hopes that the test >suite will be in a more stable state by then. This is somewhat ironic, >given that my company is also sponsoring the work to have Ubuntu >packages produced automatically for builds that have a passing test >suite. > >I realize that this might be somewhat nonsensical, given that I don't >have a good handle on what the testing situation was surrounding 2.0, >but our expectation is that going from 1.6.1 to 2.0 should be a >support and stability improvement, which is important for us. Right >now the biggest source of test failures for my company's product is >CouchDB 1.6.1 instability. We've build up a layer of retry/backoff >functionality over our couch library, but it still leaks out >sometimes. > >So we're cautiously optimistic about transitioning to 2.0, but I'm >really unenthusiastic about a release process that treats intermittent >errors as the responsibility of the end user to mitigate. I'd much >rather see master made stable (why did the replication scheduler land >if it wasn't release-ready?) and that ship as 2.1, even if it takes >longer to get there. > >Again, just input from an end user. > >Thanks for considering, >Eli > > > >On Sun, May 14, 2017 at 12:21 AM, Jan Lehnardt <[email protected]> wrote: >>> *just* for the 2.1 branch >> >> Absolutely, just for that branch, master will keep all failing tests >until >> we sort them out proper. >> >> Thanks Paul for elaborating here, that’s precisely my thinking as >well. >> >> Joan, thanks for highlighting that “just disabling all failing tests” >won’t >> do (e.g. in case of couchjs sometimes crashing), we’ll continue to >have to >> live with that until we find out what’s wrong. >> >> I was mainly thinking about the randomly failing compaction daemon >type >> tests. >> >> Best >> Jan >> -- >> >> >>> On 14. May 2017, at 05:46, Paul Davis <[email protected]> >wrote: >>> >>> Joan, >>> >>> Reading this while on ops but my understanding was that the >disabling >>> was *just* for the 2.1 branch. Other than that I agree 100%. Other >>> than wondering why you haven't merged the log upload :P Thats aweome >>> and I agree will help significantly. And I agree that the tests >aren't >>> necessarily bad its just that with a distributed/async system the >>> whole "works on my machine" turns into a "works on all developer >>> machines" but then also "blows up on way under powered VMs" which >>> means our tests have some fun timing issues. >>> >>> Given that the tests are randomly failing vs a test or two that's >>> always failing I'm not that concerned with just flagging the issue >as >>> "We're aware of it, we're working on fixing it, but we'd like to get >>> some work into a consumable release for people." >>> >>> Seem reasonable? >>> >>> On Sat, May 13, 2017 at 8:01 PM, Joan Touzet <[email protected]> >wrote: >>>> Hi everyone, >>>> >>>> I'm +/-0 on this only because there's a little ambiguity in steps 2 >and 4 >>>> I'd like to clear up. This email is part test status report and >>>> part clarification, so I apologize in advance for the length. >>>> >>>> It is absolutely _almost_ time we get 2.1 out the door. >>>> >>>> Step 2 is the equivalent of sweeping all our possible problems >under >>>> the rug. The failing tests aren't necessarily failing because we >have >>>> a bad test suite. In fact, just last week I found a genuine race >>>> condition leading to a broken Couch from one of these test >cases[1]. >>>> I don't want to just sweep everything under the rug to get a >release >>>> out the door like we did for 2.0.0; if we'd held on for a few more >weeks >>>> for that release we might have found and fixed that bug (and a few >>>> others, too.) >>>> >>>> It's worth noting that we can't disable /all/ of the failing tests >for >>>> a 2.1 release either; at least one of the failures can best be >described >>>> as "couchjs just sometimes segfaults." So unless we're ready to >just >>>> disable the entire JS test suite... ;) And for the detractors out >there, >>>> there are more EUnit than JS failing test cases right now (13 vs. >6)! >>>> >>>> Step 4, for me, *must* include re-enabling all of the failing tests >as >>>> soon as possible (or, alternately, only disabling them on the 2.1.x >>>> branch.) A PR I intend to land tomorrow, which has +1s from Paul >and >>>> Jan[2], will upload couch.log files from Travis and Jenkins when a >test >>>> fails to a central CouchDB for further analysis. Prior to this, >>>> determining the actual failure required getting lucky and having >one of >>>> the tests fail on your machine. With the exception of the >compression >>>> daemon tests (which I *just* increased the timeout on just 4 days >ago[3]) >>>> most of these test failures we just need more data. Disabling the >tests >>>> now that we finally have useful CI telemetry is like launching a >fleet of >>>> satellites to monitor global climate, then banning the agency >responsible >>>> for them from monitoring them for vital data. :D >>>> >>>> Thanks for reading. Let's move forward on 2.1...carefully. >>>> >>>> -Joan >>>> >>>> [1] >https://github.com/apache/couchdb/commit/81ee7c5ac71e617a03e967b4fc5d0358f4ba9459 >>>> [2] https://github.com/apache/couchdb/pull/514 >>>> [3] >https://github.com/apache/couchdb/commit/ca4761c6177748f6c87bd072939f7b3eb6fa1edd#diff-41b21ba8ff04bec904f235212d7c4de0 >>>> >>>> ----- Original Message ----- >>>> From: "Jan Lehnardt" <[email protected]> >>>> To: "dev" <[email protected]> >>>> Sent: Thursday, 11 May, 2017 1:41:35 PM >>>> Subject: 2.1 >>>> >>>> Hi all, >>>> >>>> we should get CouchDB 2.1 out soon and the test suite situation is >a somewhat annoying blocker, so I’m proposing something that might >sound unusual: disable the failing tests. >>>> >>>> All test failures are intermittent and we must absolutely address >this, but since nobody picked this up since February, I think we need a >new plan. >>>> >>>> The one other issue is that the replication manager was merged >recently and is still fairly new code, so I’m proposing this: >>>> >>>> 1. Fork 2.1.x off of master just before the replication scheduler >merge. >>>> >>>> 1.1. backport any other fixes in master to 2.1.x that happened >after the replication scheduler. >>>> >>>> 2. Disable all failing tests. >>>> >>>> 3. Start the release procedure. >>>> >>>> 4. Fix tests on master for 2.2, which then also can include the >replication schedule. >>>> >>>> If there are no objections, I’m happy to prepare the 2.1.x branch >early next week. >>>> >>>> Best >>>> Jan >>>> -- >> >> -- >> Professional Support for Apache CouchDB: >> https://neighbourhood.ie/couchdb-support/ >> -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
