Hi,

I'm investigating an outage that happened on a bind server. It was
configured as a caching resolving name server. It was forwarding for
one specific zone. This zone had two nameservers/forwarders of which one
at some point was unreachable due to a cable cut. The other nameserver
turned out to be dropping any requests with the DO bit set.

What seems to have happened is:

1 the bind nameserver would send 3 queries 1s apart with ENDS0+DO bit, which 
were
  dropped.
2 bind sends out a query without the DO bit, it gets a response with TTL=0
3 a burst of queued up queries for that exact query gets rushed through for 1s
  (prob not more then max-clients-per-query though, which was set at 100)
4 goto 1

This not only caused resolving failures for the forwarding data, but within the 
hour
caused the entire server to collapse under load. The number of clients asking
for this data was higher then the max-clients-per-query setting.

My questions:

1 Is this problem happening because EDNS failure is not remembered for 
forwarders?
2 Is this problem happening because EDNS failure is forgotten once there is no 
more
  data cached that used the specified nameserver?
3 Does max-clients-per-query apply to forward zone queries too, or is this 
ignored?
4 Can this behaviour be changed via a configuration option so we can remember 
this EDNS
  failure so that we're not unable to anser queries for 3 out of 4 seconds?

Paul
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to