Package: postgresql-13
Version: 13.7-0+deb11u1
Severity: important

We have found severe regressions when upgrading from bookworm to
bullseye on two of our PostgreSQL servers.

It seems like, in busy workloads, the JIT actually leaks memory. Like
a lot. In this screenshot of a yearly Grafana dashboard, you can see
memory usage is fairly regular until the upgrade (early May) at which
point the server starts regularly swapping and eventually OOM'ing:

https://gitlab.torproject.org/tpo/tpa/team/uploads/41f8850ecc4b4170f56901b4018a9870/image.png

The internal ticket we filed about this has all the gory details,
which are probably too much for this bug report:

https://gitlab.torproject.org/tpo/tpa/team/-/issues/40815

We also had issues on other servers, more examples:

https://gitlab.torproject.org/tpo/tpa/team/-/issues/40814

While this may seem like a one-off thing that affects only certain
workloads — we certainly have other PostgreSQL that do not suffer from
this problem — when it *does* affect the workload, it's pretty
catastrophic. Hence the "important" severity ("major effect on the
usability of a package, without rendering it completely unusable to
everyone").

Also, it took us a long time to track down this problem... it's
basically only because of the release notes of an unrelated project
(PuppetDB) happened to feature a similar bug report that we were
hinted this could be a problem:

https://tickets.puppetlabs.com/browse/PDB-5452

... which makes me think this problem might be more widespread than a
few workloads. It seems like DSA also had problems with the upgrade on
the sources.debian.org server which, granted, is a huge server as
well, but I don't see why that should necessarily be a problem with
PostgreSQL...

Past PostgreSQL upgrades have been basically without flaw for us: the
procedure is a little disruptive (e.g. dump/restore, basically) but
apart from that, we have never seen such a huge regression in
performance. So I figured it was worth at least a bug report.

I'm not sure what should come out of this; I can't help but think this
is a bug in the JIT, but it's far beyond my capacity to even start
debugging this specifically. So maybe this could be forwarded
upstream? But in the meantime, maybe this could be fixed "simply" by
adding a note to the Debian bullseye release notes.

One should also see if this behavior also occurs in newer releases: we
briefly considered upgrading to 14 to see if this was still happening,
before finding the JIT trick, but have not done so (yet?).

Thank you for your attention,

a.

-- System Information:
Debian Release: 11.4
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.10.0-17-amd64 (SMP w/2 CPU threads)
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages postgresql-13 depends on:
ii  debconf [debconf-2.0]  1.5.77
ii  libc6                  2.31-13+deb11u3
ii  libgcc-s1              10.2.1-6
ii  libgssapi-krb5-2       1.18.3-6+deb11u1
ii  libicu67               67.1-7
ii  libldap-2.4-2          2.4.57+dfsg-3+deb11u1
ii  libllvm11              1:11.0.1-2
ii  libpam0g               1.4.0-9+deb11u1
ii  libpq5                 13.7-0+deb11u1
ii  libselinux1            3.1-3
ii  libssl1.1              1.1.1n-0+deb11u3
ii  libstdc++6             10.2.1-6
ii  libsystemd0            247.3-7
ii  libuuid1               2.36.1-8+deb11u1
ii  libxml2                2.9.10+dfsg-6.7+deb11u2
ii  libxslt1.1             1.1.34-4+deb11u1
ii  locales                2.31-13+deb11u3
ii  locales-all            2.31-13+deb11u3
ii  postgresql-client-13   13.7-0+deb11u1
ii  postgresql-common      225
ii  ssl-cert               1.1.0+nmu1
ii  tzdata                 2021a-1+deb11u5
ii  zlib1g                 1:1.2.11.dfsg-2+deb11u2

Versions of packages postgresql-13 recommends:
pn  sysstat  <none>

postgresql-13 suggests no packages.

-- debconf information:
  postgresql-13/postrm_purge_data: true

Reply via email to