
I'm investigating an outage that happened on a bind server. It was
configured as a caching resolving name server. It was forwarding for
one specific zone. This zone had two nameservers/forwarders of which one
at some point was unreachable due to a cable cut. The other nameserver
turned out to be dropping any requests with the DO bit set.

What seems to have happened is:

1 the bind nameserver would send 3 queries 1s apart with ENDS0+DO bit, which 
2 bind sends out a query without the DO bit, it gets a response with TTL=0
3 a burst of queued up queries for that exact query gets rushed through for 1s
  (prob not more then max-clients-per-query though, which was set at 100)
4 goto 1

This not only caused resolving failures for the forwarding data, but within the 
caused the entire server to collapse under load. The number of clients asking
for this data was higher then the max-clients-per-query setting.

My questions:

1 Is this problem happening because EDNS failure is not remembered for 
2 Is this problem happening because EDNS failure is forgotten once there is no 
  data cached that used the specified nameserver?
3 Does max-clients-per-query apply to forward zone queries too, or is this 
4 Can this behaviour be changed via a configuration option so we can remember 
this EDNS
  failure so that we're not unable to anser queries for 3 out of 4 seconds?

Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list

Reply via email to