2.1 Bug Review IRC meeting minutes

Joan Touzet Fri, 21 Jul 2017 08:55:38 -0700

Hey everybody!

Here's the logfile from today's bug meeting. In short: we are
in very good shape for release candidates in the next week or
so.


Where possible, updates have been made to the tickets themselves,
so consider this log more for posterity than anything else.

Ticket count: 11
* 6 test suite issues (1 of which is a dup)
* 1 minor feature issue (Fauxton, PR is up)
* 1 Windows issue (non-blocking)
* 3 release-related chore/documentation tickets

-Joan

11:07 <+Wohali> let's get started
11:08 <+Wohali> #703 is brand new and I've never seen it before. Because it's a 
PR from a repo we don't control we don't have the logfiles.
11:09 <+Wohali> however this is the config:set timeout that vatamane was taking 
about in irc yesterday
11:09 <+davisp> For #703 lets add a dump of the _stats endpoint when we fail
11:09 <+davisp> Right, its likely that it was file_server2 or error_logger that 
got backed up
11:09 <+davisp> Will make a note of that on the ticket
11:09 <+Wohali> probably file_server2 since the ,true at the end right?
11:10 <+Wohali> ok moving on
11:10 <+Wohali> #701. couchdb_1283 go poopie again
11:10 <+Wohali> 14:35 <+davisp> vatamane: Yeah, looks like we should be able to 
do a meck:expect on the compaction function, then do some message passing and 
then just make the last call meck:passthrough(Args) which carries on with the 
original implementation.
11:10 <+Wohali> i think is relevant?
11:11 <+rnewson> "PR from a repo we don't control"?
11:11 <+Wohali> rnewson: travis has a private key in the .travis.yml file that 
is hooked to the apache/couchdb repo
11:11 <+Wohali> PRs from other repos (like cloudant/) can't use the private 
creds so the envvar doesn't get set
11:11 <+rnewson> oh
11:11 <+davisp> And likely that wouldn't have helped much here
11:12 <+davisp> Assuming my theory is anywhere near correct
11:12 <+davisp> At least, that's the only time I've ever seen it error out in 
production
11:12 <+Wohali> same thing affects #701
11:12 <+davisp> Ticket updated
11:13 <+davisp> I'll work on 701 today. I know what the issue is there
11:13 <+Wohali> ok
11:13 <+Wohali> assinging paul
11:13 <+davisp> The compaction process finishes before we get a chance to 
suspend it
11:13 <+davisp> Hence bad arg when trying to suspend a dead process
11:13 <+Wohali> moving on
11:13 <+Wohali> #593 is mine, i'm waiting on my Fauxton PR to be reviewed and 
+1'd. garren gets back Monday. michellep hasn't been available
11:14 <+Wohali> the erlang code has landed already
11:14 <+Wohali> #695
11:14 <+Wohali> haven't seen this one anywhere useful yes
11:14 <+Wohali> yet*
11:14 <+Wohali> i.e. no couch.logs to review
11:15 <+davisp> vatamane: Is that the one you duplicated locally and added 
longer timeouts for?
11:15 <+Wohali> vatamane is on vacation today and isn't here
11:15 <+Wohali> 15:07 < vatamane> I won't be around on Friday (vacation)
11:15 <+davisp> Oh right
11:15 <+davisp> Am looking at PRs to see if he just forgot to mention the ticket
11:15 <+Wohali> garren is on vacation today as well, and chewbranca is out for 
another couple of weeks
11:16 <+Wohali> ok
11:16 <+davisp> He is?
11:16 <+Wohali> 23:38 < chewbranca> davisp: alright, I've wrapped up my review 
on the ddoc_cache PR. And on that note, I'm officially on vacation for the next 
few weeks, so you might have trouble getting me to do a third round of review 
;-)
11:16 <+davisp> I think he's back unless he's leaving again
11:17 <+davisp> Ah, he got back the other day. But its like 8a his time so 
wouldn't expect to see him around
11:17 <+Wohali> ah that was Jun 30 so yeah
11:17 < jaydoane> I can try to repro 695 using a slow docker container
11:18 <+Wohali> ok #574 is one that really troubles me
11:18 <+Wohali> it's been repeating a whole lot for over a month and no action
11:18 <+davisp> Wohali: jaydoane: Ahh, the one that Nick reproduced was 633 by 
setting the IO bandwidth to 5KiB
11:18 <+Wohali> yeah 633 is cloesd already
11:18 <+Wohali> unless it recurs
11:18 <+davisp> jaydoane: But you might try a similar thing and see if its 
similar
11:18 <+davisp> Right
11:18 <+davisp> the command he used should be in backlog here. I'll try and 
find it
11:19 < ASFBot> jo...@atypical.net master * 42f26d5 (NOTICE) 
https://gitbox.apache.org/repos/asf?p=couchdb.git;h=42f26d5 :
11:19 < ASFBot> >> Explicitly mention Facebook "BSD+Patents" license in NOTICE 
per LEGAL-303
11:19 < jaydoane> I was unable to repro 574 using extremely low disk IO
11:19 <+davisp> here it is: VBoxManage bandwidthctl ${VM} set Limit --limit 5KB
11:19 <+Wohali> is my analysis of 574 valid?
11:20 <+Wohali> it doesn't look to me like a disk IO issue
11:20 <+Wohali> the failure is in couch_att somewhere
11:21 <+davisp> Haven't read it all but I agree it doesn't appear to be IO 
related
11:21 <+davisp> Seems like a race between process tear down in that something 
dies because of that too large error and cascades badly
11:21 < jaydoane> actually found this in my logs, but it's not the same stack 
trace in the ticket https://www.irccloud.com/pastebin/XKbSWEHH/
11:22 <+Wohali> for me this is my #1 issue for us to look at before release 
since it's actually affecting replication
11:22 <+Wohali> the rest feel like mainly badly written test cases that need 
help
11:23 <+Wohali> jaydoane rnewson given davisp's limited cycles could either of 
you look at this one?
11:23 <+davisp> I'll try and find time but might not be till later today or 
Monday if no one else gets to it
11:24 <+rnewson> my cycles are pretty limited too tbh
11:24 <+Wohali> ok
11:24 <+davisp> Though for anyone not familiar with the MP parsing code that's 
a deep dark cave of despair. Feel free to get to it before me
11:24 < jaydoane> I spent the better part of yesterday trying to repro 574, but 
got nothing so far -- not sure if slowing the tests will help, but I can keep 
trying (maybe soaking)
11:24 <+rnewson> for bugs that I can go "aha, I know what that is" I can turn 
out a fix
11:24 <+rnewson> a deep dive into MP parsing is the worst
11:25 <+Wohali> :(
11:25 <+Wohali> ok
11:25 <+Wohali> #674
11:25 <+davisp> I'm inclined to bump 674. I've added a log message for it but I 
don't believe its failed since I've added it
11:25 <+Wohali> paul merged more debug logging a week ago and i don't think 
we've seen it since
11:26 <+davisp> Where bump == not block the release
11:26 <+Wohali> ok
11:26 <+Wohali> unless it recurs, I'm in favour
11:26 <+davisp> Yap, hopefully if it recurs that log message will lead us to 
the fix
11:26 <+davisp> And or at least make us comfortable deleting the dumb assertion
11:27 <+Wohali> done
11:27 <+Wohali> #673
11:27 <+Wohali> this appears to be an issue with how JS is doing the server 
reset
11:27 <+Wohali> we shouldn't be returning control to the test script prior to 
the node actually being up
11:27 <+Wohali> but, somehow, we are
11:27 <+Wohali> I'll take this one
11:28 <+Wohali> I recently reworked the restart() logic a bit to wait longer, 
to try and fix some of the stats tests (which I ultimately disabled)
11:28 <+Wohali> so it's possible that something I did is causing issues? i 
dunno.
11:28 <+davisp> Oooh
11:28 <+Wohali> #669
11:28 <+davisp> I think I see it
11:28 <+Wohali> oh?
11:28 <+davisp> We're checking the local port and not the clustered port in 
ensure_all_nodes_alive
11:28 <+Wohali> ahh
11:29 <+davisp> and couch_httpd comes up before chttpd
11:29 <+Wohali> local port comes up first, right
11:29 <+davisp> Adding a note and a link.
11:29 <+Wohali> thanks, that's an easy fix
11:29 <+Wohali> weird that sometimes we're too fast, and other times too slow :)
11:29 <+Wohali> I think #669 and #673 are dupes
11:30 <+davisp> Me too and noted as such
11:30 <+Wohali> your belt and suspenders style is dashing!
11:30 <+davisp> on 673
11:30 <+Wohali> saw, and thx
11:31 <+Wohali> #683 is a chore, someone has to go and read git log and write 
up something human consumable.
11:31 <+Wohali> if no one else volunteers I can take it...
11:31 <+Wohali> since the motion passed I will be nuking and recreating the 
2.1.x branch
11:31 <+davisp> +1
11:31 <+Wohali> once we get all of this stuff cloesd out
11:32 -!- m-i [~m...@2a01cb0803d61900e12665bddce157a9.ipv6.abo.wanadoo.fr] has 
quit [Remote host closed the connection]
11:32 <+davisp> Cause that means i can merge the new ddoc_cache soon
11:32 <+davisp> Yap
11:32 <+Wohali> yeah :) soon
11:32 <+Wohali> need to branch the other repos, too
11:32 <+Wohali> and it'd be nice to get tags on the other repos if we're 
pointing to stuff
11:32 <+Wohali> i'll add a ticket for that so I don't forget
11:33 <+davisp> +1
11:33 <+Wohali> done, #704
11:33 <+Wohali> #642, good news if you didn't see it: 
https://repo-nightly.couchdb.org/
11:34 <+Wohali> jenkins issues are almost all completely worked out so we'll 
have a top level master/ and 2.1.x/ tree soon
11:34 <+davisp> Saw it. As far as I'm concerned you should just merge that when 
you're comfortable. I dunno anyone else that knows Jenkins well enough to have 
an opinion
11:34 <+Wohali> appreciate it, I can't actually test it on other branches 
unless the file actually exists, so I'll avoid the RTC model for this one thing
11:35 <+Wohali> but again we have the very latest packages for each branch (we 
don't keep back versions), plus the latest 10 source code tarballs
11:35 <+Wohali> for dev@ consumption only in line wiht the ASF requirements on 
this stuff
11:35 <+davisp> yep
11:35 <+davisp> Ahh, ok fair enough
11:35 <+Wohali> we also got real bintray deb/rpm repos for actual releases, 
which is great
11:35 <+Wohali> and they're working on getting us access to docker for 
apache/couchdb as our image namespace
11:35 <+davisp> Cool
11:36 <+Wohali> only thing left for me to follow-up on is snaps, and I have the 
credentials I need
11:36 <+Wohali> we won't be pushing latest snaps, just released builds
11:36 <+davisp> You're saying a lot of words that I know in other contexts...
11:36 <+Wohali> basically, lots of semi-authorized real package/container 
goodness.
11:37 <+Wohali> it's been a long haul.
11:37 <+davisp> That bit I know :D
11:37 <+Wohali> and #698 we just got a report that the fixed Windows package I 
made doesn't work on Windows Server 2016 :/
11:37 <+Wohali> so, i just downloaded that OS so I can test on it
11:37 <+Wohali> it'll get low priority
11:37 <+Wohali> and can be fixed after release
11:38 <+Wohali> any questions?
11:38 <+Wohali> and, any one mind if I email the summary of this meeting to 
dev@ ?
11:38 < jaydoane> good idea

2.1 Bug Review IRC meeting minutes

Reply via email to