Assuming that not many tests deadlock I think it should be fine to simply
let the build process deadlock. Even if multiple tests fail consistently,
then one would see them one after another. That way we wouldn't have to
build some extra tooling. Moreover, the behaviour would be consistent on
the local machine because the same test would also deadlock there.

Decreasing the build/test time is in my opinion more relevant for tests
which actually do pass but do things too slowly and, hence, more of an
orthogonal discussion.

Cheers,
Till

On Mon, Apr 26, 2021 at 8:25 PM Robert Metzger <rmetz...@apache.org> wrote:

> I was actually recently wondering if we shouldn't rather use timeouts more
> aggressively in JUnit.
> There was recently a case where a number of tests accidentally ran for 5
> minutes, because a timeout was increased to 5 minutes.
> If we had a global limit of 1 minute per test, we would have caught this
> case (and we would encourage people to be careful with CI time). If we are
> going to add some custom timeout infrastructure to JUnit (in Java, not in
> the CI bash scripts ;) ) it should be fairly straightforward to print the
> current stack traces in case of a timeout.
> Another benefit of solving this problem at a Junit level is that the
> behavior would be the same in all environments (for example when running
> the tests locally).
> The final benefit would be that we would get a list of all tests that are
> timing out (from that module), instead of having the test stall at a random
> test.
>
>
> On Mon, Apr 26, 2021 at 10:49 AM Till Rohrmann <trohrm...@apache.org>
> wrote:
>
> > +1. I think this rule makes a lot of sense.
> >
> > Cheers,
> > Till
> >
> > On Mon, Apr 26, 2021 at 10:08 AM Arvid Heise <ar...@apache.org> wrote:
> >
> > > +1 from my side.
> > >
> > > We should probably double-check if we really need 4h timeouts on test
> > tasks
> > > in AZP. It feels like 2h be enough.
> > >
> > > On Mon, Apr 26, 2021 at 9:54 AM Dawid Wysakowicz <
> dwysakow...@apache.org
> > >
> > > wrote:
> > >
> > > > Hi devs!
> > > >
> > > > I wanted to bring up something that was discussed in a few
> independent
> > > > groups of people in the past days. I'd like to revise using timeouts
> in
> > > > our JUnit tests. The suggestion would be not to use them anymore. The
> > > > problem with timeouts is that we have no thread dump and stack traces
> > of
> > > > the system as it hangs. If we were not using a timeout, the CI runner
> > > > would have caught the timeout and created a thread dump which often
> is
> > a
> > > > great starting point for debugging.
> > > >
> > > > This problem has been spotted e.g. during debugging FLINK-22416[1].
> In
> > > > the past thread dumps were not always taken for hanging tests, but it
> > > > was changed quite recently in FLINK-21346[2]. I am happy to hear your
> > > > opinions on it. If there are no objections I would like to add the
> > > > suggestion to the Coding Guidelines[3]
> > > >
> > > > Best,
> > > >
> > > > Dawid
> > > >
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-22416
> > > >
> > > > [2] https://issues.apache.org/jira/browse/FLINK-21346
> > > >
> > > > [3]
> > > >
> > > >
> > >
> >
> https://flink.apache.org/contributing/code-style-and-quality-java.html#java-language-features-and-libraries
> > > >
> > > >
> > > >
> > >
> >
>

Reply via email to