Hi Libor,

Thanks for providing update_no_change_faster. Here are the results running
on the same hardware with RR counts.
https://drive.google.com/file/d/1ZAwtEWLY0hFSRR3QX-EzVRFnch7WKHMX/view?usp=drive_link

There doesn't seem to be any measurable improvement in times, but the RR
counts relative to the zone size may be useful context. ANd unfortunately
the record distribution in the zones would be inherent to the workload and
so not something that would be easily restructured.

These numbers have production noise baked in, and the average time spent on
an update per zone is meeting our 5 seconds goal. My concern is that there
is very little headroom into the future for remaining under 5 seconds given
the zone data isn't easily predictable. I don't mind running a more
isolated test too if you'd prefer seeing the numbers under those conditions.

Thanks

On Mon, Mar 16, 2026 at 6:42 AM Libor Peltan <[email protected]> wrote:

> Hi Jonathan,
>
> thanks much for providing the flame graph and some measurements. Any
> insight to real-world deployments is always useful for us!
>
> However, from the provided profile it doesn't really look that the biggest
> problem would be the routines iterating the whole zone content tree. Even
> if they were theoretically completely eliminated, we could expect
> improvement in some tens of per cent, but not multiple-fold speed increase.
>
> Anyway, one thing in the flame graph immediately caught my eye and I made
> a very simple, but potentially measurable improvement:
> https://gitlab.nic.cz/knot/knot-dns/-/commits/update_no_change_faster
>
> Would you be able to build Knot DNS from this branch, re-run your
> measurements and conclude if this already is an improvement for you?
>
> By the way, you said that you expect the updates to be applied under 5
> seconds, which already seems to be the case according to your measurements.
> Does it mean that what you seek is "just" some improvement in tens of per
> cent?
>
> Thank you,
>
> Libor
> On 13. 03. 26 18:18, Jonathan Reed wrote:
>
> As a consequence, there was lower motivation for speeding up zone updates
>> processing -- historically...
>
> Thank you for sharing some history, that's helpful to know.
>
> And thanks for waiting on our information. Here is a profile during IXFRs
> and a small snippet after parsing zone logs of updates with changes to
> adjust-threads.
>
> https://drive.google.com/file/d/19wP9MvAHNQ0dA7fDMlx_GELGa2dMNmNm/view?usp=sharing
>
> https://drive.google.com/file/d/1GgWmffFMJ-7kJ9gZNUTlSq7zipNDPp2C/view?usp=sharing
>
> Do you really need to apply the updates in an instant manner...
>
> We're dedicating 2 cores and 2 background-workers to process the updates,
> but being able to cleanly apply all updates in under 5 seconds would be
> ideal. Admittedly, throwing more hardware at this workload would improve
> our situation, but hopefully these tests still prove helpful in at least
> showing us where our cycles are going.
>
> Thanks!
>
> On Thu, Mar 12, 2026 at 5:42 AM Libor Peltan <[email protected]> wrote:
>
>> Hi Jonathan,
>>
>> thank you for reaching us and for such a deep insight into Knot DNS.
>>
>> Let me start by explaining some history.
>>
>> Knot DNS was designed around two main stays: 1) query answering is fast
>> (also by pre-adjusting the zone contents carefully) 2) updating the zone
>> does not affect the answering speed.
>>
>> As a consequence, there was lower motivation for speeding up zone
>> updates processing -- historically, it was always single-threaded and
>> always proportional to the zone size (not update size).
>>
>> Several years ago, our big supporter operating large zone asked us to
>> improve this, and we did what we could at the time -- many parts of the
>> zone update processing became incremental (proportional to update size),
>> especially those that took most time (like NSEC3-relevant cross-pointers
>> that demanded two hash computations per domain). This included
>> introduction of incremental DNSSEC signing and validation (including
>> unique NSEC(3) chain processing routines).
>>
>> For the cases when things couldn't get really incremental, we also
>> introduced parallelized processing:
>> https://www.knot-dns.cz/docs/3.5/singlehtml/#signing-threads and
>> https://www.knot-dns.cz/docs/3.5/singlehtml/#adjust-threads (the latter
>> might be interesting to you!).
>>
>> However, some parts of update processing remained proportional to zone
>> size (and your observations confirm this). Yes, the whole QP-trie is
>> always iterated. We hope that those procedures are fast enough in
>> general, so that it doesn't really hurt (few seconds per
>> million-RR-zone?).
>>
>> I can't really say if (or if not) it is possible to incrementalise this
>> further. For us, correct Knot DNS behavior in all cases is most
>> important, so we can't really say "we just don't need this and that in
>> the simple case, so let's skip some edge-case correctness for the sake
>> of speed". Just for illustration, imagine a deep and branchy zone, were
>> an incremental update adds a single NS, which occludes many subordinate
>> RRs that become non-authoritative, with many consequences...
>>
>> Yes, in theory those adjustments could be conducted only on affected
>> subtrees and the "prev" pointer might not be really needed without NSECs
>> and wildcards -- but I'd be really afraid to modify the code in this
>> manner :( Also my personal effort in advocating DNSSEC motivates me less
>> to optimizations that only take place without DNSSEC...
>>
>> Anyway, I'd be really interested if you perform your tests with a
>> profiler, in order to see what are the concrete bottlenecks in your
>> case. Would you be able and willing to do this for us?
>>
>> I'd also like to know what are your goals. Do you really need to apply
>> the updates in an instant manner (and what is the target time versus
>> current time?), or you are just observing a choking and resource
>> exhaustion and would actually benefit from slowing down the update
>> processing pace, by e.g. artificially limiting the frequency of updates?
>> Anyway, I'm a bit surprised that the Bind9 is not the bottleneck in this
>> case :)
>>
>> Thanks!
>>
>> Libor
>>
>> On 12. 03. 26 0:15, Jonathan Reed wrote:
>> > Hi Knot team,
>> >
>> > I'm running Knot as an Auth secondary receiving IXFR from a BIND 9
>> > primary. To isolate bottlenecks I've stripped the config down as far
>> > as I know how. Here's what I'm using.
>> > zonefile-sync: -1
>> > zonefile-load: none
>> > journal-content: none
>> >
>> > There is no DNSSEC or any downstream IXFR serving happening. Logs are
>> > confirming that it is genuine IXFR and no signs of any AXFR fallback.
>> > "semantic-checks" is off, and knotd is linked against jemalloc. I'm
>> > really trying to make this as quick as possible by avoiding the disk.
>> >
>> > The pattern:
>> > IXFR processing time scales roughly proportionally with total zone
>> > size, even when the changeset is small, for example, a few hundred RRs
>> > out of several hundred thousand.
>> > There is what appears to be a full zone walk on every IXFR commit in
>> > the adjust logic, with single threaded execution due to parent befroe
>> > child ordering requirements. Although I'd want your confirmation
>> > before reading too much into it.
>> >
>> > Questions:
>> > 1. With journal-content: none, does IXFR apply trigger a full
>> > in-memory tree walk of the QP-trie, rather than an isolated
>> > incremental record-level update? If so, is that a necessary
>> > consequence of running without a journal to maintain state?
>> > 2. For a secondary with no NSEC/NSEC3, no wildcards or any downstream
>> > IXFR'ing, could a "lightweight secondary" mode bypass post-apply
>> > bookkeeping that might only be targetted to primaries and signers?
>> > 3. Could it rewalk only subtrees where adds or removes happen to their
>> > ancestors, rather than the full zone? If NSEC is absent, is the prev
>> > pointer chain actually used at query time, or can it be skipped
>> entirely?
>> >
>> > Our use case is secondary-only, with large zones and high frequency
>> > updates. We're hoping there is something on the configuration or
>> > roadmap side that might help, and ultimately not sure if we're just
>> > bumping up against a realistic constraint.
>> > Thanks for the great software btw, loving it.
>> >
>> > Thanks!
>> >
>> > --
>>
>
--

Reply via email to