Hi Hillary,

By default, BIND will return SERVFAIL to the client if it can't
complete the full iteration process within 10 seconds.  This is
controllable by the "resolver-query-timeout" parameter.  As for why
your recursive server doesn't just try elsewhere, it _will_, but it
assumes that it's querying a valid nameserver, so the original query
needs to time out first.  It takes several queries for BIND to get its
round-trip time cache in order.  With six authoritative NSs, it'll
take longer than if you only had three.

As for 129.114.13.18 being lame - it's hard to be lame if you aren't
getting responses.  Lame just means that responses from the nameserver
aren't authoritative, even though it's listed in your NS records.

Your best option is to fix the non-responding nameservers or remove
them from your NS records if they aren't supposed to respond to
queries - name resolution isn't just broken for you, it's broken for
everyone who wants to find web1.production.tacc.utexas.edu.

John

On Fri, Sep 9, 2016 at 5:23 PM, Hillary Nelson <nelsonhilla...@gmail.com> wrote:
> Also should mention that our BIND is 9.9.8-P4, what confuses me here is that
> the listed nameserver (129.114.13.18) is lame and our nameserver (
> 192.168.1.100) can't get any responses from it(see tcpdump above), why our
> nameserver try other listed NS servers  instead sending 'ServFail' to the
> client(10.79.1.6) ?
> Any help will be greatly appreciated!
>
> On Fri, Sep 9, 2016 at 1:07 PM, Hillary Nelson <nelsonhilla...@gmail.com>
> wrote:
>>
>> We've been seeing sporadic failure of resolve this name
>> web1.production.tacc.utexas.edu from our nameserver.
>>
>> There are 6 NS listed for domain production.tacc.utexas.edu, two of the
>> six don't seem to work(dc1.production.tacc.utexas.edu 129.114.13.17 and
>> dc2.production.tacc.utexas.edu 129.114.13.18).
>>
>> If our nameserver hits the two and doesn't get any response, it sends
>> 'ServFail' to client, shouldn't the our nameserver keeps trying the other
>> four working nameservers listed for the domain ?
>>
>> Here is the tcpdump:
>>
>> 12:33:38.593146 IP 10.79.1.6.51980 > 192.168.1.100.53: 60950+ [1au] A?
>> tas.tacc.utexas.edu. (48)
>> 12:33:38.593573 IP 192.168.1.100.54985 > 129.114.13.18.53: 40455% [1au] A?
>> web1.production.tacc.utexas.edu. (60)
>> 12:33:43.593131 IP 10.79.1.6.51980 > 192.168.1.100.53: 60950+ [1au] A?
>> tas.tacc.utexas.edu. (48)
>> 12:33:47.593796 IP 192.168.1.100.49009 > 129.114.13.18.53: 38559% [1au] A?
>> web1.production.tacc.utexas.edu. (60)
>> 12:33:48.593234 IP 10.79.1.6.51980 > 192.168.1.100.53: 60950+ [1au] A?
>> tas.tacc.utexas.edu. (48)
>> 12:33:48.593583 IP 192.168.1.100.53 > 10.79.1.6.51980: 60950 ServFail
>> 0/0/1 (48)
>>
>>
>> Thanks in advance for your help!
>>
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to