Hi Andor, For testManyChildWatchersAutoReset, that was me who put 10min timeout on the test itself. I wanted to see the logs and the problem is that when test is timed out by ant (default 15min) logs aren't captured. I agree that it became much flakier. I've pushed the PR right now to increase to 14min.
Also, I'm still looking at the slowness and posted some thoughts in jira https://issues.apache.org/jira/browse/ZOOKEEPER-3046 On Wed, Jul 18, 2018 at 9:09 PM Michael Han <h...@apache.org> wrote: > Thanks Pat for promptly fixing this! > > I have no idea of the "failed to get" symptoms. Probably we could give it > more days and see if the pattern recurs? If not might be a transient infra > issue... > > On Wed, Jul 18, 2018 at 11:16 AM, Patrick Hunt <ph...@apache.org> wrote: > > > Ok, I committed a change that seems to address the main failure: > > > https://github.com/apache/zookeeper/commit/06b9507ab78a1a055b8f467846c157 > > 91600b72ee > > > > https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html > > > > However I do notice some oddness in the sense that for some jobs/runs it > > fails to get the information from the REST interface, even though it's > fine > > for most of them, take a look, any ideas? > > https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > ZooKeeper-Find-Flaky-Tests/456/console > > > > [ZooKeeper-Find-Flaky-Tests] $ /bin/bash /tmp/ > > jenkins4452773653790031730.sh > > ERROR:__main__:failed to get: > > https://builds.apache.org/job/ZooKeeper-trunk/108/ > > testReport/api/json?tree=suites%5Bname%2Ccases% > > 5BclassName%2Cname%2Cstatus%5D%5D > > ERROR:__main__:failed to get: > > https://builds.apache.org/job/ZooKeeper-trunk/104/ > > testReport/api/json?tree=suites%5Bname%2Ccases% > > 5BclassName%2Cname%2Cstatus%5D%5D > > ERROR:__main__:failed to get: > > https://builds.apache.org/job/ZooKeeper-trunk/100/ > > testReport/api/json?tree=suites%5Bname%2Ccases% > > 5BclassName%2Cname%2Cstatus%5D%5D > > > > > > Notice that it doesn't complain about job 107 (etc...) > > > > Any ideas on this? Have you seen this before? Perhaps we should open an > > INFRA jira? > > > > Patrick > > > > On Wed, Jul 18, 2018 at 10:52 AM Patrick Hunt <ph...@apache.org> wrote: > > > > > FYI, created this: > > > https://issues.apache.org/jira/browse/INFRA-16785 > > > for the security warnings, not sure if that's causing the issue. Likely > > > it's the recent jenkins upgrade, looking into it a bit... > > > > > > Patrick > > > > > > > > > On Wed, Jul 18, 2018 at 9:48 AM Michael Han <h...@apache.org> wrote: > > > > > >> Hi Andor, > > >> > > >> >> I suspect it should succeed eventually if we were to increase the > > >> timeout even more. But is that correct? Bug or infrastructure issue? > > >> > > >> You could set up a dedicated git branch with all patches (e.g. the one > > in > > >> ZOOKEEPER-2251) you want to apply and I can set up a dedicated Jenkins > > job > > >> that points to this branch and stress test the entire unit test suite. > > >> Some > > >> tests are only flaky when they ran on Apache infrastructure and when > > they > > >> ran together. > > >> > > >> It would be interesting to figure out what cause this test fail. Since > > >> same > > >> test works reliably in 3.4, there must be some commits in 3.5 that we > > >> could > > >> possibly blame... > > >> > > >> >> I'm going to raise a ticket on that if somebody willing to fix it. > > >> > > >> I just had a brief look before Jenkins is down. Looks like python was > > >> complaining about some SSL stuff and I suspect if we upgrade to use > > later > > >> version of python (3.x) it might work. I'll try that later when > Jenkins > > is > > >> back. > > >> > > >> > > >> On Wed, Jul 18, 2018 at 8:42 AM, Andor Molnar > > <an...@cloudera.com.invalid > > >> > > > >> wrote: > > >> > > >> > Hi, > > >> > > > >> > *branch-3.4* > > >> > > > >> > I've taken a quick look at our Jenkins builds and in terms of flaky > > >> tests, > > >> > it looks like branch-3.4 is in a pretty good shape. The build hasn't > > >> failed > > >> > for 5-6 days on all JDKs which I think is pretty awesome. > > >> > > > >> > *branch-3.5* > > >> > > > >> > This branch is in very bad condition. Which is quite unfortunate > given > > >> > we're in the middle of stabilising it. :) > > >> > Especially on JDK8, last successful build was 11 days ago. JDK9 (50% > > >> > failing) and JDK10 (30% failing) are looking better in the last 10 > > >> builds. > > >> > > > >> > Interestingly (apart from a few quite rare ones) it looks there's > > only 1 > > >> > test which is quite nasty on this branch: > > testManyChildWatchersAutoReset > > >> > > > >> > There's a Jira about fixing it and a fix has been merged by > increasing > > >> the > > >> > timeout of the test, but having a bug on the branch is also possible > > >> > causing the test to fail even with 10 min timeout. > > >> > > > >> > I wasn't able to repro the failing test on my machine (Mac and > > >> CentOS7), it > > >> > always finished in 30-40 seconds maximum. On jenkins slaves it shows > > the > > >> > following: > > >> > > > >> > *JDK 8:* > > >> > > > >> > Report creation timed out. > > >> > > > >> > > > >> > *JDK 9:* > > >> > > > >> > New Failures > > >> > Chart > > >> > See children > > >> > Build Number ⇒ > > >> > Package-Class-Testmethod names ⇓ > > >> > 351 > > >> > 350 > > >> > 349 > > >> > 348 > > >> > 347 > > >> > 346 > > >> > 345 > > >> > 344 > > >> > 343 > > >> > 342 > > >> > 341 > > >> > 340 > > >> > 339 > > >> > 338 > > >> > 337 > > >> > 336 > > >> > 335 > > >> > 334 > > >> > testManyChildWatchersAutoReset > > >> > 45.604 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/351/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 600.337 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/350/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 21.904 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/349/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 583.063 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/348/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 600.325 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/347/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 600.383 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/346/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 600.362 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/345/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 21.139 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/344/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 24.031 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/343/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 584.200 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/342/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 600.327 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/341/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 600.323 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/340/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 23.737 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/339/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 600.406 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/338/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 547.004 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/337/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 600.393 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/336/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > N/A > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/test_results_analyzer/> > > >> > 373.955 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java9/334/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > > > >> > > > >> > *JDK 10:* > > >> > > > >> > > > >> > New Failures > > >> > Chart > > >> > See children > > >> > Build Number ⇒ > > >> > Package-Class-Testmethod names ⇓ > > >> > 110 > > >> > 109 > > >> > 108 > > >> > 107 > > >> > 106 > > >> > 105 > > >> > 104 > > >> > 103 > > >> > 102 > > >> > 101 > > >> > 100 > > >> > 99 > > >> > 98 > > >> > 97 > > >> > 96 > > >> > 95 > > >> > 94 > > >> > 93 > > >> > 92 > > >> > testManyChildWatchersAutoReset > > >> > 364.945 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/110/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 543.983 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/109/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 388.182 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/108/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 600.446 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/107/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 600.025 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/106/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 535.046 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/105/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 600.306 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/104/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 474.005 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/103/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 560.925 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/102/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 600.328 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/101/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 558.547 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/100/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 600.397 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/99/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 600.414 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/98/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 430.383 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/97/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 564.064 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/96/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 600.357 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/95/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 432.435 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/94/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 596.378 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/93/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > 39.242 > > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ > > >> > ZooKeeper_branch35_java10/92/testReport/org.apache.zookeeper.test/ > > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset> > > >> > > > >> > > > >> > It takes ages to complete on Jenkins for some reason and it looks > like > > >> it > > >> > ends quite frequently close to the limit, so I suspect it should > > succeed > > >> > eventually if we were to increase the timeout even more. But is that > > >> > correct? > > >> > Bug or infrastructure issue? > > >> > > > >> > *master / 3.6* > > >> > > > >> > Pretty much the same as 3.5. I haven't seen > > >> testManyChildWatchersAutoReset > > >> > failing on this branch with JDK8 which is a bit confusing, but other > > >> then > > >> > that I see the same pattern on JDK9 and JDK10. Unable to generate > the > > >> above > > >> > reports here, because Test Result Analyzer keep timeouting for me, > but > > >> I'll > > >> > follow-up when I have them. > > >> > > > >> > Btw. Flaky Test report has been broken for 10 days, I'm going to > > raise a > > >> > ticket on that if somebody willing to fix it. (I'm planning to do > so.) > > >> > It would be nice to see the report working again, because if my > > >> > observations are correct, we don't have too many annoying tests > apart > > >> from > > >> > the one mentioned. > > >> > > > >> > Thanks, > > >> > Andor > > >> > > > >> > > > > > >