Package: knot-resolver
Version: 5.3.1-1+deb11u1

An armhf bullseye machine received several rapid and unexpected power
outages, automatically rebooting each time, with kresd opening
immediately on boot.

after the third boot, I see these mdb errors and warnings, resulting in
a permanent failure of the service to start:

Aug 03 12:55:38 host systemd[1]: Started Knot Resolver daemon.
Aug 03 12:55:39 host kresd[441]: [ta_update] refreshing TA for .
Aug 03 12:55:49 host kresd[441]: [ta_update] active refresh failed for . with 
rcode: 2
Aug 03 12:55:49 host kresd[441]: [ta_update] next refresh for . in 2.4 hours
Aug 03 12:56:18 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:18 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:18 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:19 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:19 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:19 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:20 host kresd[441]: [tls] Using ephemeral TLS credentials
Aug 03 12:56:20 host kresd[441]: [tls] RFC 7858 OOB key-pin (0): pin-sha256=""
Aug 03 12:56:21 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:21 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:21 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:25 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:25 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:25 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal 
error - cursor stack limit reached
Aug 03 12:56:32 host kresd[441]: mdb.c:2433: Assertion 'mp->mp_pgno != pgno' 
failed in mdb_page_touch()
Aug 03 12:56:32 host systemd[1]: kresd@1.service: Main process exited, 
code=killed, status=6/ABRT
Aug 03 12:56:32 host systemd[1]: kresd@1.service: Failed with result 'signal'.
Aug 03 12:56:32 host systemd[1]: kresd@1.service: Scheduled restart job, 
restart counter is at 1.
Aug 03 12:56:32 host systemd[1]: Stopped Knot Resolver daemon.
Aug 03 12:56:32 host systemd[1]: Starting Knot Resolver daemon...
Aug 03 12:56:32 host kresd[496]: mdb.c:2433: Assertion 'mp->mp_pgno != pgno' 
failed in mdb_page_touch()
Aug 03 12:56:32 host systemd[1]: kresd@1.service: Main process exited, 
code=killed, status=6/ABRT
Aug 03 12:56:32 host systemd[1]: kresd@1.service: Failed with result 'signal'.
Aug 03 12:56:32 host systemd[1]: Failed to start Knot Resolver daemon.
Aug 03 12:56:33 host systemd[1]: kresd@1.service: Scheduled restart job, 
restart counter is at 2.
Aug 03 12:56:33 host systemd[1]: Stopped Knot Resolver daemon.
Aug 03 12:56:33 host systemd[1]: Starting Knot Resolver daemon...
Aug 03 12:56:33 host kresd[501]: mdb.c:2433: Assertion 'mp->mp_pgno != pgno' 
failed in mdb_page_touch()
Aug 03 12:56:33 host systemd[1]: kresd@1.service: Main process exited, 
code=killed, status=6/ABRT
Aug 03 12:56:33 host systemd[1]: kresd@1.service: Failed with result 'signal'.
Aug 03 12:56:33 host systemd[1]: Failed to start Knot Resolver daemon.

Presumably, this has to do with some sort of corruption of one of these
two files:

/var/cache/knot-resolver/data.mdb
/var/cache/knot-resolver/lock.mdb

The are on a single filesystem (mounted at /) that uses btrfs, and the
system was running kernel 5.10.0-22-armmp when this happened.

As a workaround, I simply did:

    rm /var/cache/knot-resolver/*.mdb
    systemctl restart kresd@1.service kres-cache-gc.service

and the service ran fine from there out (albeit with an initially cold
DNS cache).

It would be better for the system to run with a cold DNS cache, than for
it to not run at all.  If there is this kind of failure in accessing the
mdb file, it might be more robust to just give up on the mdb file
(setting it aside for diagnosis or merely unlinking it) and starting
over.  Not exactly sure how to ensure that happens cleanly if there are
multiple processes accessing the same corrupted mdb file, though.

         --dkg

Attachment: signature.asc
Description: PGP signature

Reply via email to