Hello Chuck, Please, could you send me (in private) your configuration and the latest logs including the segfault message? How much RAM does does the server have? What is the size of the journal database?
What do you mean by "some servers were falling behind the master, as evidenced by their SOA serial number.". Do you have some logs? FYI, the upcoming Knot DNS 2.7.0 implements ECS functionality. Thanks, Daniel On 2018-07-03 18:46, Chuck Musser wrote:
We're experiencing occasional failures with Knot crashing while running as a slave. The behavior is as follows: the slave will run for 2 months or so and then segfault. Our system automatically restarts the process, but after 15 minutes or less, the segfault happens again. This repeats until we remove the /var/lib/knot/journal and /var/lib/knot/timers directories. This seems to fix it up for a while: a newly started process will run fine for another couple of months. More details on our setup: These systems serve a little less than a hundred zones, some of which change at a rapid rate. We have configured the servers to not flush the zone data to regular files. The server software is 2.5.7, but with the changes from the "ecs-patch" branch applied. A while back, I tried a release from the newer branch (I'm pretty sure it was 2.6.4), but I had a problem there where some servers were falling behind the master, as evidenced by their SOA serial number. Diagnosing this on a more recent branch probably makes more sense, but I'd be a little leery of dealing with two problems, not just one. I can provide various data: the (gigantic) seemingly "corrupt" journal/timer files and the segfault messages from the syslog. I don't have any coredumps, but I'll turn those on today. Given the nature of the problem, it might take a while for it to manifest. Chuck
-- https://lists.nic.cz/cgi-bin/mailman/listinfo/knot-dns-users