Package: prometheus-node-exporter-collectors
Version: 0.0~git20221011.8f6be63-1
Severity: important
Tags: security
X-Debbugs-Cc: Salvatore Bonaccorso <[email protected]>, [email protected], 
Debian Security Team <[email protected]>

As requested, by Salvatore lowering prio and avoiding embargo.
-----

Hello, happy new year, and thanks.

This looks like an apt deadlock, which prevents updates, unattended upgrades, 
and so critical security updates
for systems where they are enabled.
(Yes, we can just manually kill the offending apt_info.py process to 
temporarily solve the issue - but this is not the good solution).
As it prevents security updates, and despite it unlikely to happen massively, 
and be practically exploited, I feel this requires real attention.


Symptoms:
Persistent apt update locking error:
# apt update
Reading package lists... Done
E: Could not get lock /var/lib/apt/lists/lock. It is held by process 65553 
(python3)
N: Be aware that removing the lock file is not a solution and may break your 
system.
E: Unable to lock directory /var/lib/apt/lists/

# 1 hour later, same issue, same holding PID 65553

# Concerned processes:
# ps aux |grep pyth
root        1259  0.0  0.1 121076 27528 ?        Ssl  Jan06   0:00 
/usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgra>
root       65553  0.0  0.4  89640 76908 ?        S    12:09   0:03 python3 
/usr/share/prometheus-node-exporter-collectors/apt_info.py
ee         70395  0.0  0.2 124164 42844 ?        Sl   12:35   0:00 
/bin/python3.11 /home/ee/.vscode-oss/extensions/ms-python.python> (not 
suspected)

# ps aux |grep apt
root       65551  0.0  0.0   9552  4252 ?        Ss   12:09   0:00 /bin/bash -c 
/usr/share/prometheus-node-exporter-collectors/apt_>
root       65553  0.0  0.4  89640 76908 ?        S    12:09   0:03 python3 
/usr/share/prometheus-node-exporter-collectors/apt_info.>
root       65554  0.0  0.0   2464   884 ?        S    12:09   0:00 sponge 
/var/lib/prometheus/node-exporter/apt.prom
_apt       65814  0.0  0.0  27192 13204 ?        S    12:09   0:00 
/usr/lib/apt/methods/https
_apt       65815  0.0  0.0  24420 10236 ?        S    12:09   0:00 
/usr/lib/apt/methods/http
_apt       65816  0.0  0.0  27192 13204 ?        S    12:09   0:00 
/usr/lib/apt/methods/https
_apt       65817  0.0  0.0  24420 10272 ?        S    12:09   0:00 
/usr/lib/apt/methods/http
_apt       65819  0.0  0.0  17572  7624 ?        S    12:09   0:00 
/usr/lib/apt/methods/gpgv
_apt       65826  0.0  0.0  27192 13464 ?        S    12:09   0:00 
/usr/lib/apt/methods/https
_apt       65829  0.0  0.0  24420 10292 ?        S    12:09   0:00 
/usr/lib/apt/methods/http
_apt       66110  0.0  0.0  17528  7500 ?        S    12:10   0:00 
/usr/lib/apt/methods/store
_apt       66112  0.0  0.0  18436  8636 ?        S    12:10   0:00 
/usr/lib/apt/methods/rred
_apt       66113  0.0  0.0  18576  8860 ?        S    12:10   0:00 
/usr/lib/apt/methods/rred

The deadlock is obviously between the unattended-upgrade proc (1259), and the 
prometheus tryptic: 65551/53/54.


# 65553 seems to be the culprit - as apt update tells us
# strace -p 65553
strace: Process 65553 attached
pselect6(29, [12 13 14 16 18 20 22 24 26 28], [], NULL, {tv_sec=0, 
tv_nsec=499419645}, NULL) = 0 (Timeout)
pselect6(29, [12 13 14 16 18 20 22 24 26 28], [], NULL, {tv_sec=0, 
tv_nsec=500000000}, NULL) = 0 (Timeout)
... repeats 'forever' ....
All fds are pipes, I could not get more info until the processed crashed due to 
my diagnostic atttempts.
An apt/python/prom collector specialist should instantly identify these pipes 
and make more deductions, from the following state:

# gdb -p 65553 and bt:
#0  0x00007fa4bf65f794 in __GI___select (nfds=29, readfds=0x7ffc24f8e7c0, 
writefds=0x7ffc24f8e840, exceptfds=0x0,
   timeout=0x7ffc24f8e750) at ../sysdeps/unix/sysv/linux/select.c:69
#1  0x00007fa4bebad338 in pkgAcquire::Run(int) () from 
/lib/x86_64-linux-gnu/libapt-pkg.so.6.0
#2  0x00007fa4becb1485 in AcquireUpdate(pkgAcquire&, int, bool, bool) () from 
/lib/x86_64-linux-gnu/libapt-pkg.so.6.0
#3  0x00007fa4becb1976 in ListUpdate(pkgAcquireStatus&, pkgSourceList&, int) ()
  from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
#4  0x00007fa4bed32fe1 in ?? () from 
/usr/lib/python3/dist-packages/apt_pkg.cpython-311-x86_64-linux-gnu.so
#5  0x0000000000521cf0 in ?? ()
#6  0x000000000053983c in PyObject_Vectorcall ()
#7  0x000000000052a570 in _PyEval_EvalFrameDefault ()
#8  0x000000000052222b in PyEval_EvalCode ()
#9  0x0000000000647f07 in ?? ()
#10 0x00000000006457cf in ?? ()
#11 0x0000000000651920 in ?? ()
#12 0x000000000065166b in _PyRun_SimpleFileObject ()
#13 0x0000000000651494 in _PyRun_AnyFileObject ()
#14 0x000000000065022f in Py_RunMain ()
#15 0x00000000006248b7 in Py_BytesMain ()
#16 0x00007fa4bf58818a in __libc_start_call_main (main=main@entry=0x624820, 
argc=argc@entry=2,
   argv=argv@entry=0x7ffc24f8f298) at ../sysdeps/nptl/libc_start_call_main.h:58
#17 0x00007fa4bf588245 in __libc_start_main_impl (main=0x624820, argc=2, 
argv=0x7ffc24f8f298, init=<optimized out>,
   fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc24f8f288) 
at ../csu/libc-start.c:381
#18 0x0000000000624751 in _start ()

This seems to suggest that the location of the deadlock, for 65553, is:
(apt_info.py)
def _main():
   cache = apt.cache.Cache()

   # First of all, attempt to update the index. If we don't have permission
   # to do so (or it fails for some reason), it's not the end of the world,
   # we'll operate on the old index.
   with contextlib.suppress(apt.cache.LockFailedException, 
apt.cache.FetchFailedException):
       cache.update() <<<<<<<<<<<< VERY LIKELY



I could not confirm the precise location, as trying to get a python backtrace 
from the process generated a SEGV:
(gdb) call PyRun_SimpleString("print('toto\n')") # to test
'PyRun_SimpleString' has unknown return type; cast the call to its declared 
return type
(gdb) call (void*)PyRun_SimpleString("print('toto\n')")
Program received signal SIGSEGV, Segmentation fault.
# Oops... will not get a python trace now.

Hopefully, I collected the core (~27MB) - if interested, tell me - keeping it 
for a few weeks:
#0  0x000000000063187a in ?? ()
#1  0x00000000006349b2 in PyImport_AddModuleObject ()
#2  0x0000000000634688 in PyImport_AddModule ()
#3  0x000000000063e323 in PyRun_SimpleStringFlags ()
(but I feel it unrelated, and not so usefull - but I may be wrong)

I feel I can't help more now, so throwing the potato 😉

Best,
Eric 'Steve' Estievenart




-- System Information:
Debian Release: bookworm/sid
  APT prefers unstable
  APT policy: (990, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.0.0-6-amd64 (SMP w/4 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, 
TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages prometheus-node-exporter-collectors depends on:
ii  moreutils                 0.67-1
ii  prometheus-node-exporter  1.5.0-1+b1
ii  python3-apt               2.5.0
ii  systemd-sysv              252.4-1

Versions of packages prometheus-node-exporter-collectors recommends:
ii  ipmitool       1.8.19-4
ii  jq             1.6-2.1
ii  nvme-cli       2.2.1-3
ii  python3        3.11.1-1
ii  smartmontools  7.3-1+b1

prometheus-node-exporter-collectors suggests no packages.

-- no debconf information

Reply via email to