Re: Speeding up the test builds

Piotr Nowojski Fri, 08 Apr 2022 03:36:30 -0700

I have checked and it looks like this change (increasing forkCount) gives a
small improvement in the `test_ci_build core` module of 5-10 minutes. Let's
maybe double check this after FLINK-26995 is merged.


Best,
Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-26995


czw., 31 mar 2022 o 16:29 Chesnay Schepler <[email protected]> napisał(a):

> Ah. I missed one very important line.
>
>  > [unit] tests would require less memory and thus we would be able to
> increase the number of forks for them
>
> That's definitely a good idea, and it should be trivial to implement it.
> It would be interesting to see how much that would give us.
>
> Note that in the long-term we could look deeper parallelizing the tests
> using junit5 (again, what Francesco already started in the table tests),
> removing the need for forks entirely.
>
> On 31/03/2022 15:40, Chesnay Schepler wrote:
> > > we decided to limit the number of testing forks/disable fork reuse
> > for almost all tests
> >
> > IT cases were generally not reusing forks for as long as I can
> > remember. A few modules opted into fork-reuse; most didn't.
> > The Table API recently enabled fork-reuse, started causing OOMs due to
> > memory leaks, which have since been fixed and fork-reuse has been
> > enabled again.
> >
> > > machine that has 7GB of RAM in total
> >
> > This is indeed the case for the azure-provided machines (used for e2e
> > tests and contributor builds).
> >
> > The machines we use for apache/flink / PRs have 64GB ram in total,
> > spread over 7 agents, so about ~8 per agent and roughly equivalent to
> > the azure machines if you account for some overhead.
> > I'm not aware of any explicit assignment of off-heap memory.
> > This has worked fine in the past, and anytime we ran into OOM errors
> > we were eventually able to pinpoint the cause to some memory leak in
> > Flink.
> > I thus assume that the OOMs we've recently seen are again due to some
> > new leak.
> >
> >
> > Overall I appreciate the sentiment of using less resources for unit
> > tests, but I don't see how it would really address the issue.
> > You still have the potential for all agents running ITCases at the
> > same time, leaving us in the status-quo.
> > It could reduce the likelihood of OOMs, but so long as leaks are there
> > it is inevitable that we hit one.
> > (And as mush as I hate it, if there is a leak I'd kinda prefer that to
> > be more visible, as it seems the only way to get people to look into
> > leaks at all is getting into a situation where everyone is seriously
> > annoyed.)
> >
> > Instead we could try decreasing the general memory usage slightly
> > (even 256mb less would give us a nice buffer on the alibaba machines),
> > and I would hazard a guess that it'd barely reduce testing times
> > (since most ITCases shouldn't hit that limit anyway).
> >
> > I do agree that in the long-term we should strive towards re-using
> > forks, and I know that Francesco also looked into executing test cases
> > in parallel using JUnit5 to speed things up.
> > But there are a number of classes that make this quite difficult;
> > singletons like FileSystem/GlobalConfiguration, anything using static
> > stuff like context environments (not sure if this still applies),
> > legacy testing code, class-loading bugs in libraries etc etc.
> >
> > I would actually split ITCases into 2 categories depending on whether
> > they work with fork-reuse.
> >
> > On 31/03/2022 14:48, Piotr Nowojski wrote:
> >> Hi devs,
> >>
> >> Recently we were plagued with OOM errors and as a result we decided to
> >> limit the number of testing forks/disable fork reuse for almost all
> >> tests.
> >> Currently AFAIK each JVM fork has assigned 4GB of memory (2GB heap
> >> and 2GB
> >> off heap) and we are running two forks, on a machine that has 7GB of
> >> RAM in
> >> total.
> >>
> >> What I would like to discuss is to introduce test categories, for
> >> example
> >> small and large tests, where small tests would require less memory
> >> and thus
> >> we would be able to increase the number of forks for them. We could
> >> probably achieve this differentiation via different means (annotations,
> >> test naming conventions, etc), but I would propose to just re-use our
> >> pre-existing convention of suffixing more "unity" testing classes with
> >> `Test` suffix, while more "integration like" classes with `ITCase`
> >> suffix.
> >> We could keep running `ITCase`'s with two forks 4GB memory each,
> >> while for
> >> example unit tests we could run with four forks 2GB memory each.
> >>
> >> What do you think?
> >>
> >> Best,
> >> Piotrek
> >>
> >
>
>

Re: Speeding up the test builds

Reply via email to