Package: knot-resolver Version: 5.3.1-1+deb11u1 An armhf bullseye machine received several rapid and unexpected power outages, automatically rebooting each time, with kresd opening immediately on boot.
after the third boot, I see these mdb errors and warnings, resulting in a permanent failure of the service to start: Aug 03 12:55:38 host systemd[1]: Started Knot Resolver daemon. Aug 03 12:55:39 host kresd[441]: [ta_update] refreshing TA for . Aug 03 12:55:49 host kresd[441]: [ta_update] active refresh failed for . with rcode: 2 Aug 03 12:55:49 host kresd[441]: [ta_update] next refresh for . in 2.4 hours Aug 03 12:56:18 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:18 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:18 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:19 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:19 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:19 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:20 host kresd[441]: [tls] Using ephemeral TLS credentials Aug 03 12:56:20 host kresd[441]: [tls] RFC 7858 OOB key-pin (0): pin-sha256="" Aug 03 12:56:21 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:21 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:21 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:25 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:25 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:25 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: [cache] LMDB error: MDB_CURSOR_FULL: Internal error - cursor stack limit reached Aug 03 12:56:32 host kresd[441]: mdb.c:2433: Assertion 'mp->mp_pgno != pgno' failed in mdb_page_touch() Aug 03 12:56:32 host systemd[1]: kresd@1.service: Main process exited, code=killed, status=6/ABRT Aug 03 12:56:32 host systemd[1]: kresd@1.service: Failed with result 'signal'. Aug 03 12:56:32 host systemd[1]: kresd@1.service: Scheduled restart job, restart counter is at 1. Aug 03 12:56:32 host systemd[1]: Stopped Knot Resolver daemon. Aug 03 12:56:32 host systemd[1]: Starting Knot Resolver daemon... Aug 03 12:56:32 host kresd[496]: mdb.c:2433: Assertion 'mp->mp_pgno != pgno' failed in mdb_page_touch() Aug 03 12:56:32 host systemd[1]: kresd@1.service: Main process exited, code=killed, status=6/ABRT Aug 03 12:56:32 host systemd[1]: kresd@1.service: Failed with result 'signal'. Aug 03 12:56:32 host systemd[1]: Failed to start Knot Resolver daemon. Aug 03 12:56:33 host systemd[1]: kresd@1.service: Scheduled restart job, restart counter is at 2. Aug 03 12:56:33 host systemd[1]: Stopped Knot Resolver daemon. Aug 03 12:56:33 host systemd[1]: Starting Knot Resolver daemon... Aug 03 12:56:33 host kresd[501]: mdb.c:2433: Assertion 'mp->mp_pgno != pgno' failed in mdb_page_touch() Aug 03 12:56:33 host systemd[1]: kresd@1.service: Main process exited, code=killed, status=6/ABRT Aug 03 12:56:33 host systemd[1]: kresd@1.service: Failed with result 'signal'. Aug 03 12:56:33 host systemd[1]: Failed to start Knot Resolver daemon. Presumably, this has to do with some sort of corruption of one of these two files: /var/cache/knot-resolver/data.mdb /var/cache/knot-resolver/lock.mdb The are on a single filesystem (mounted at /) that uses btrfs, and the system was running kernel 5.10.0-22-armmp when this happened. As a workaround, I simply did: rm /var/cache/knot-resolver/*.mdb systemctl restart kresd@1.service kres-cache-gc.service and the service ran fine from there out (albeit with an initially cold DNS cache). It would be better for the system to run with a cold DNS cache, than for it to not run at all. If there is this kind of failure in accessing the mdb file, it might be more robust to just give up on the mdb file (setting it aside for diagnosis or merely unlinking it) and starting over. Not exactly sure how to ensure that happens cleanly if there are multiple processes accessing the same corrupted mdb file, though. --dkg
signature.asc
Description: PGP signature