Hi!

At our site we wanted to copy_forward all incoming mail to another 
server. So, this backup server got only connections from the MX 
machine. I got into big troubles because most of the DNS lookups had 
a 2 second delay and fallback to the second nameserver in 
resolv.conf. Since both have to handle about 1000 mails/minute this 
was a very big issue.

tcpdump showed that all delayed DNS requests from exim had the same 
query id. Since our nameservers have "use-id-pool" set they ignored 
most of the requests and it was clear why I recognized 2 second 
delays and even complete timeouts.

So I digged deaper and wanted to know why exim was using the same 
query id's. Since exim uses libresolv I thought that it can't be 
exims fault. So I tried to write proof of concept code to rebuild 
this behaviour. And I was successfull.

I discovered that every first call to res_search() after a fork() 
uses the same queryid the parent has in _res.id right before the 
fork(). I've included my PoC-Code. Use it with a "tcpdump port 53" 
and you'll see what I mean. At least libresolv used on FC4(2.3.6), 
ubuntu dapper(2.3.6) and FC5(2.4) behave that way.

Since the listening exim process does no lookups, all his forked 
childs behave like that and the reverse lookups for the MX machines 
IP all have the same queryid as long as the listener runs.

Since I'm not sure if it's the responsibility of glibc/libresolv to 
set a new queryid on fork() I report this bug here;-) At least a 
"couple of installations" out in the wild use these versions of 
libresolv and the workarround is pretty simple.

_res.id = res_randomid();
...after the fork. Calling res_init() again didn't help.

So I fixed my problem for now with:
---------------
--- src/child.c.orig    2006-09-02 21:13:48.000000000 +0200
+++ src/child.c 2006-09-02 21:16:26.000000000 +0200
@@ -78,6 +78,9 @@
  uschar **argv =
    store_get((extra + acount + MAX_CLMACROS + 16) * sizeof(char *));

+/* resolver bug workarround */
+_res.id = res_randomid();
+
  /* In all case, the list starts out with the path, any macros, and a changed
  config file. */

----------------
But I'm sure this is not the best place.

Regards, Wolfgang Breyha
University of Vienna

PS: the proof of concept....
----------------
#include <sys/types.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <arpa/nameser.h>
#include <resolv.h>

void dodnslookup();

main(argc, argv)
int argc;
char *argv[];
{
     /* sanity check: one (and only one) argument? */
     if(argc != 2){
         (void) fprintf(stderr, "usage: %s host\n", argv[0]);
         exit(1);
     }

     (void) res_init();

     printf("after init: %d\n", _res.id);

     dodnslookup(argv[1]);
     printf("parent: %d\n", _res.id);

     int i;
     for(i=0; i<3; i++)
     {
         int status;
         pid_t pid = fork();
         if (pid)
         {
             pid_t rc = waitpid(pid, &status, 0);
         }
         else
         {
//            _res.id = res_randomid();
             printf("child(%d) before: %d\n", i, _res.id);
             dodnslookup(argv[1]);
             printf("child(%d) after: %d\n", i, _res.id);
             exit(0);
         }
     }

     exit(0);
}

void
dodnslookup(host)
char *host;
{
     union {
         HEADER hdr;
         u_char buf[NS_PACKETSZ];
     } response;
     int responseLen;

     if((responseLen =
            res_search(host,
                      ns_c_in,
                      ns_t_a,
                      (u_char *)&response,
                      sizeof(response)))
            < 0)
         exit(1);
}
----------------
-- 
Wolfgang Breyha <[EMAIL PROTECTED]> | http://www.blafasel.at/
Vienna University Computer Center | Austria 


-- 
## List details at http://www.exim.org/mailman/listinfo/exim-users 
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://www.exim.org/eximwiki/

Reply via email to