One of the most useful initial steps in troubleshooting is to establish your ability to reproduce the error.
So, I'd look at getting to the command-line of the originating resolver, if possible, and using a command-line tool like "dig" to generate queries towards the intended target and see if you get the same SERVFAIL result. In order to *exactly* replicate the queries, however, you need to understand what "+E(0)K" in the log means. That's recursive-desired (the default for a query generated by a command-line tool like "dig"), EDNS0 and a DNSCOOKIE requested. Supposedly, modern versions of "dig" will set EDNS0 and DNSCOOKIE by default, so you might be lucky and a straight "dig" with no special options will replicate the error. If not, you may need to get your hands on a more modern version of "dig", or use another tool. Once you've replicated the error, then start changing things. I'd start with turning EDNS0 and/or DNSCOOKIE on and off. Both of those are relatively "modern" extensions to DNS (at least, compared to the "classic" DNS of RFCs 1034 and 1035) and it's possible that the responding server just doesn't deal with them properly. With EDNS0, there are different buffer sizes that could, hypothetically, be tried. But, unless you've tuned that specifically in named.conf, it's should be the case that the "dig" default is the same as the "named" one, so it's unlikely that changing buffer size will produce any change in behavior. It's possible, I suppose... If you can't get any change of behavior by twiddling those things, then one would have to delve deeper. But I won't make this post any longer than it already is :-) That should be enough to get you started... - Kevin On Mon, Oct 1, 2018 at 3:34 PM Karol Babioch <ka...@babioch.de> wrote: > Hi, > > Am 01.10.18 um 21:10 schrieb Karol Babioch: > > Do you have any suggestion / recommendation what I can do to narrow the > > problem down? I already tried to increase the tracing and enabled query > > logging, but I couldn't get to the bottom of things. What else can I do > > here? > > as an additional data point, this is what I get with debugging (level 9): > > > 01-Oct-2018 21:25:52.976 query-errors: debug 1: client @0x7f89401d4c10 > 10.24.0.1#51206 (mail.babioch.de): view internal: query failed (SERVFAIL) > for mail.babioch.de/IN/A at query.c:10672 > > 01-Oct-2018 21:25:52.976 query-errors: debug 2: fetch completed at > resolver.c:9094 for mail.babioch.de/A in 0.030641: SERVFAIL/success > [domain:babioch.de > ,referral:2,restart:0,qrysent:0,timeout:0,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0,qminsteps:2] > > I really don't get it, the fetch completes just fine according to this > (SERVFAIL/success). Also the querylog does not indicate what the issue is: > > > Okt 01 21:30:53 kvm1.babioch.de named[17380]: client @0x7f15e8056140 > 10.24.0.1#58354 (mail.babioch.de): view internal: query: mail.babioch.de > IN A +E(0)K (10.24.0.1) > > Okt 01 21:30:53 kvm1.babioch.de named[17380]: client @0x7f15e8056140 > 10.24.0.1#58354 (mail.babioch.de): view internal: query failed (SERVFAIL) > for mail.babioch.de/IN/A at query.c:10672 > > Can one of you BIND gurus see what's wrong here? What else can/should I > try. I'm pretty much out of ideas for now. > > Thank you very much in advance! > > Best regards, > Karol Babioch > > _______________________________________________ > Please visit https://lists.isc.org/mailman/listinfo/bind-users to > unsubscribe from this list > > bind-users mailing list > bind-users@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-users >
_______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users