Bug#779224: marked as done (lintian: run checks in parallel)

Debian Bug Tracking System Tue, 04 Aug 2020 21:49:10 -0700

Your message dated Tue, 4 Aug 2020 21:43:51 -0700
with message-id 
<CAFHYt569tFw+WG_9aAmVZSnLg6mkm3fHr9=s6_uczqjfoaf...@mail.gmail.com>
and subject line Re: lintian: Bug#779224: Run checks in parallel
has caused the Debian Bug report #779224,
regarding lintian: run checks in parallel
to be marked as done.


This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
779224: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=779224
Debian Bug Tracking System
Contact [email protected] with problems

--- Begin Message ---

Package: lintian
Version: 2.5.30+deb8u3
Severity: important

Hi,

Lintian itself suffers from a similar issue to the "fork(): out of
memory"-issue in #776658.

In a nutshell, lintian will process a "heavy weight" package group
pushing its memory usage up.  Once the group is over, lintian will
clean its cache but Perl will *not* release the memory to the OS.
This leaves lintian at a high memory issue and then when it forks, the
children inherits this usage.

This is especially bad during unpacking where we may spawn up to N
(N="$(nproc) + 1" or --parallel N) concurrent workers, *all* inheriting
the memory consumption.  With none of them doing exec anymore[1], we
can now "trivally" end up at a memory usage of: (N+1) * (MaxRes).
  Concrete example: lindsay.d.o got 2 cores and ~4GB of RAM.  In
todays run, lintian was using 1.2G of RAM.  Given the above, it would
be reserving up to 4 * 1.2 GB RAM.  Unsurprisingly, this gave created
several warnings for DSA as well.


For "regular" users of lintian, this is probably a relatively minor
problem, as most of them run lintian on a single package group.
Therefore, they will not experience the same accumulation or "memory
explosion".
  However, we can trigger similar issues there as collections might load
the same caches as the main process.  The usage here is contained and
short lived, so it might require a very unlikely race-condition to be
as bad as the example described above.


AFAICT, to solve this, we basically have to do one or more of:

 * Bring general memory usage down.
   - Difficult, given perl's relunctance to release memory.  Here our best
     bet is to prevent the memory from being used in the first place.
 * Do checks in subprocesses, so memory will be reclaimed between packaging
   groups.
   - Extra complexity and basically we would be unable to use it for parallel
     processing.  Or rather, if we did, we would just use the same amount of
     memory during checks as well.

~Niels

[1] I introduced this behaviour a while back to optimise our unpacking
code.  Previously we had a start up time of ~0.1s or so for some of
our collections, because they had to reload all the perl modules.
Though, even if we use go back to the "exec" variant, we are not
guaranteed to be home free.

--- End Message ---

--- Begin Message ---

Hi,

On Sun, Mar 29, 2020 at 9:23 PM Felix Lechner
<[email protected]> wrote:
>
> It is slower by a factor of two or more.

Actually, forking ~200 checks came with a huge performance penalty.
Execution times were 25% to 20x longer when using IO::Async. Many
people noticed it. As long as Lintian uses Perl, checks will probably
not run in parallel.

The number of checks is also rising steadily, which works further
against the strategy.

Additional references may be found in commits 150bc265 and cb45b444 as
well as Bug#966122 and Bug#966368.

The bug was originally about the memory footprint, but that has not
been a problem lately.

Closing this bug.

Kind regards
Felix Lechner

--- End Message ---

Bug#779224: marked as done (lintian: run checks in parallel)

Reply via email to