Hi Andor,

For testManyChildWatchersAutoReset, that was me who put 10min timeout on
the test itself. I wanted to see the logs and the problem is that when test
is timed out by ant (default 15min) logs aren't captured.
I agree that it became much flakier. I've pushed the PR right now to
increase to 14min.

Also, I'm still looking at the slowness and posted some thoughts in jira
https://issues.apache.org/jira/browse/ZOOKEEPER-3046

On Wed, Jul 18, 2018 at 9:09 PM Michael Han <h...@apache.org> wrote:

> Thanks Pat for promptly fixing this!
>
> I have no idea of the "failed to get" symptoms. Probably we could give it
> more days and see if the pattern recurs? If not might be a transient infra
> issue...
>
> On Wed, Jul 18, 2018 at 11:16 AM, Patrick Hunt <ph...@apache.org> wrote:
>
> > Ok, I committed a change that seems to address the main failure:
> >
> https://github.com/apache/zookeeper/commit/06b9507ab78a1a055b8f467846c157
> > 91600b72ee
> >
> > https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html
> >
> > However I do notice some oddness in the sense that for some jobs/runs it
> > fails to get the information from the REST interface, even though it's
> fine
> > for most of them, take a look, any ideas?
> > https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > ZooKeeper-Find-Flaky-Tests/456/console
> >
> > [ZooKeeper-Find-Flaky-Tests] $ /bin/bash /tmp/
> > jenkins4452773653790031730.sh
> > ERROR:__main__:failed to get:
> > https://builds.apache.org/job/ZooKeeper-trunk/108/
> > testReport/api/json?tree=suites%5Bname%2Ccases%
> > 5BclassName%2Cname%2Cstatus%5D%5D
> > ERROR:__main__:failed to get:
> > https://builds.apache.org/job/ZooKeeper-trunk/104/
> > testReport/api/json?tree=suites%5Bname%2Ccases%
> > 5BclassName%2Cname%2Cstatus%5D%5D
> > ERROR:__main__:failed to get:
> > https://builds.apache.org/job/ZooKeeper-trunk/100/
> > testReport/api/json?tree=suites%5Bname%2Ccases%
> > 5BclassName%2Cname%2Cstatus%5D%5D
> >
> >
> > Notice that it doesn't complain about job 107 (etc...)
> >
> > Any ideas on this? Have you seen this before? Perhaps we should open an
> > INFRA jira?
> >
> > Patrick
> >
> > On Wed, Jul 18, 2018 at 10:52 AM Patrick Hunt <ph...@apache.org> wrote:
> >
> > > FYI, created this:
> > > https://issues.apache.org/jira/browse/INFRA-16785
> > > for the security warnings, not sure if that's causing the issue. Likely
> > > it's the recent jenkins upgrade, looking into it a bit...
> > >
> > > Patrick
> > >
> > >
> > > On Wed, Jul 18, 2018 at 9:48 AM Michael Han <h...@apache.org> wrote:
> > >
> > >> Hi Andor,
> > >>
> > >> >> I suspect it should succeed eventually if we were to increase the
> > >> timeout even more. But is that correct? Bug or infrastructure issue?
> > >>
> > >> You could set up a dedicated git branch with all patches (e.g. the one
> > in
> > >> ZOOKEEPER-2251) you want to apply and I can set up a dedicated Jenkins
> > job
> > >> that points to this branch and stress test the entire unit test suite.
> > >> Some
> > >> tests are only flaky when they ran on Apache infrastructure and when
> > they
> > >> ran together.
> > >>
> > >> It would be interesting to figure out what cause this test fail. Since
> > >> same
> > >> test works reliably in 3.4, there must be some commits in 3.5 that we
> > >> could
> > >> possibly blame...
> > >>
> > >> >> I'm going to raise a ticket on that if somebody willing to fix it.
> > >>
> > >> I just had a brief look before Jenkins is down. Looks like python was
> > >> complaining about some SSL stuff and I suspect if we upgrade to use
> > later
> > >> version of python (3.x) it might work. I'll try that later when
> Jenkins
> > is
> > >> back.
> > >>
> > >>
> > >> On Wed, Jul 18, 2018 at 8:42 AM, Andor Molnar
> > <an...@cloudera.com.invalid
> > >> >
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > *branch-3.4*
> > >> >
> > >> > I've taken a quick look at our Jenkins builds and in terms of flaky
> > >> tests,
> > >> > it looks like branch-3.4 is in a pretty good shape. The build hasn't
> > >> failed
> > >> > for 5-6 days on all JDKs which I think is pretty awesome.
> > >> >
> > >> > *branch-3.5*
> > >> >
> > >> > This branch is in very bad condition. Which is quite unfortunate
> given
> > >> > we're in the middle of stabilising it. :)
> > >> > Especially on JDK8, last successful build was 11 days ago. JDK9 (50%
> > >> > failing) and JDK10 (30% failing) are looking better in the last 10
> > >> builds.
> > >> >
> > >> > Interestingly (apart from a few quite rare ones) it looks there's
> > only 1
> > >> > test which is quite nasty on this branch:
> > testManyChildWatchersAutoReset
> > >> >
> > >> > There's a Jira about fixing it and a fix has been merged by
> increasing
> > >> the
> > >> > timeout of the test, but having a bug on the branch is also possible
> > >> > causing the test to fail even with 10 min timeout.
> > >> >
> > >> > I wasn't able to repro the failing test on my machine (Mac and
> > >> CentOS7), it
> > >> > always finished in 30-40 seconds maximum. On jenkins slaves it shows
> > the
> > >> > following:
> > >> >
> > >> > *JDK 8:*
> > >> >
> > >> > Report creation timed out.
> > >> >
> > >> >
> > >> > *JDK 9:*
> > >> >
> > >> > New Failures
> > >> > Chart
> > >> > See children
> > >> > Build Number ⇒
> > >> > Package-Class-Testmethod names ⇓
> > >> > 351
> > >> > 350
> > >> > 349
> > >> > 348
> > >> > 347
> > >> > 346
> > >> > 345
> > >> > 344
> > >> > 343
> > >> > 342
> > >> > 341
> > >> > 340
> > >> > 339
> > >> > 338
> > >> > 337
> > >> > 336
> > >> > 335
> > >> > 334
> > >> >  testManyChildWatchersAutoReset
> > >> > 45.604
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/351/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 600.337
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/350/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 21.904
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/349/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 583.063
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/348/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 600.325
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/347/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 600.383
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/346/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 600.362
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/345/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 21.139
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/344/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 24.031
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/343/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 584.200
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/342/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 600.327
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/341/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 600.323
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/340/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 23.737
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/339/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 600.406
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/338/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 547.004
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/337/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 600.393
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/336/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > N/A
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/test_results_analyzer/>
> > >> > 373.955
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java9/334/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> >
> > >> >
> > >> > *JDK 10:*
> > >> >
> > >> >
> > >> > New Failures
> > >> > Chart
> > >> > See children
> > >> > Build Number ⇒
> > >> > Package-Class-Testmethod names ⇓
> > >> > 110
> > >> > 109
> > >> > 108
> > >> > 107
> > >> > 106
> > >> > 105
> > >> > 104
> > >> > 103
> > >> > 102
> > >> > 101
> > >> > 100
> > >> > 99
> > >> > 98
> > >> > 97
> > >> > 96
> > >> > 95
> > >> > 94
> > >> > 93
> > >> > 92
> > >> >  testManyChildWatchersAutoReset
> > >> > 364.945
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/110/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 543.983
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/109/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 388.182
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/108/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 600.446
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/107/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 600.025
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/106/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 535.046
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/105/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 600.306
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/104/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 474.005
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/103/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 560.925
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/102/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 600.328
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/101/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 558.547
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/100/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 600.397
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/99/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 600.414
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/98/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 430.383
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/97/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 564.064
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/96/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 600.357
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/95/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 432.435
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/94/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 596.378
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/93/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> > 39.242
> > >> > <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> > ZooKeeper_branch35_java10/92/testReport/org.apache.zookeeper.test/
> > >> > DisconnectedWatcherTest/testManyChildWatchersAutoReset>
> > >> >
> > >> >
> > >> > It takes ages to complete on Jenkins for some reason and it looks
> like
> > >> it
> > >> > ends quite frequently close to the limit, so I suspect it should
> > succeed
> > >> > eventually if we were to increase the timeout even more. But is that
> > >> > correct?
> > >> > Bug or infrastructure issue?
> > >> >
> > >> > *master / 3.6*
> > >> >
> > >> > Pretty much the same as 3.5. I haven't seen
> > >> testManyChildWatchersAutoReset
> > >> > failing on this branch with JDK8 which is a bit confusing, but other
> > >> then
> > >> > that I see the same pattern on JDK9 and JDK10. Unable to generate
> the
> > >> above
> > >> > reports here, because Test Result Analyzer keep timeouting for me,
> but
> > >> I'll
> > >> > follow-up when I have them.
> > >> >
> > >> > Btw. Flaky Test report has been broken for 10 days, I'm going to
> > raise a
> > >> > ticket on that if somebody willing to fix it. (I'm planning to do
> so.)
> > >> > It would be nice to see the report working again, because if my
> > >> > observations are correct, we don't have too many annoying tests
> apart
> > >> from
> > >> > the one mentioned.
> > >> >
> > >> > Thanks,
> > >> > Andor
> > >> >
> > >>
> > >
> >
>

Reply via email to