Hi,
I'm investigating an outage that happened on a bind server. It was configured as a caching resolving name server. It was forwarding for one specific zone. This zone had two nameservers/forwarders of which one at some point was unreachable due to a cable cut. The other nameserver turned out to be dropping any requests with the DO bit set. What seems to have happened is: 1 the bind nameserver would send 3 queries 1s apart with ENDS0+DO bit, which were dropped. 2 bind sends out a query without the DO bit, it gets a response with TTL=0 3 a burst of queued up queries for that exact query gets rushed through for 1s (prob not more then max-clients-per-query though, which was set at 100) 4 goto 1 This not only caused resolving failures for the forwarding data, but within the hour caused the entire server to collapse under load. The number of clients asking for this data was higher then the max-clients-per-query setting. My questions: 1 Is this problem happening because EDNS failure is not remembered for forwarders? 2 Is this problem happening because EDNS failure is forgotten once there is no more data cached that used the specified nameserver? 3 Does max-clients-per-query apply to forward zone queries too, or is this ignored? 4 Can this behaviour be changed via a configuration option so we can remember this EDNS failure so that we're not unable to anser queries for 3 out of 4 seconds? Paul _______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users