Hi, On Mon, Jul 28, 2014 at 04:43:37PM -0700, Marcus Ewert wrote: > Package: libc6 > Version: 2.13-38+deb7u1 > Severity: normal > > Hello, > > On test systems running stress workloads we were regularly encountering a > bug > in gethostbyname that is fixed in libc6 in jessie. For completeness I've > included the entire repro/investigation process; however, we are fairly > sure the > bug is the same as debian bug #722075. I'm writing to inquire if this > bugfix can > be backported to wheezy (stable). > > We encountered this bug on fractional core VMs running workloads that stress > disk, cpu, and networking. As part of that testing we make many concurrent > HTTP > request in python, the relevant code being similar to: > > > def GetURL(**kwargs): > > url = 'http://www.example.com/' > > request = urllib2.Request(url) > > return urllib2.urlopen(request, **kwargs).read() > > > > def HammerGetHostByID(): > > while True: > > try: > > GetURL(timeout=1) > > except: > > pass > > > > for _ in xrange(10): > > thread = threading.Thread(target=HammerGetHostByID) > > thread.start() > > Running a workload like this in 500 VMs running wheezy would yield O(8) > failures > over 24 hours with the following output: > > *** glibc detected *** /usr/bin/python: double free or corruption (out) > > Digging a little deeper with a debugger we found that whenever these were > hit, > the stack would contain _nss_dns_gethostbyname4_r and have garbage stack > frames > above that. The gethostbyname() call most likely comes from the above > urlopen. > > Given this observation, we suspected a connection to debian bug #722075, and > attempted the following patch to libc6: > > diff -rupN eglibc-2.13/resolv/res_send.c eglibc-2.13-mod/resolv/res_send.c > --- eglibc-2.13/resolv/res_send.c 2010-03-26 14:08:35.000000000 -0700 > +++ eglibc-2.13-mod/resolv/res_send.c 2014-07-02 10:23:28.521088097 -0700 > @@ -1330,6 +1330,7 @@ send_dg(res_state statp, > retval = reopen (statp, terrno, ns); > if (retval <= 0) > return retval; > + pfd[0].fd = EXT(statp).nssocks[ns]; > } > } > goto wait; > > With this single-line patch we no longer hit the 'double free or corruption' > message even when running 100 VMs for over 5 days. I extracted the above > code > fix from https://lists.debian.org/debian-glibc/2014/06/msg00013.html, but > modified the diff to fit on 2.13-38+deb7u1. > > If a fix similar to this could be included in wheezy stable at some point it > would be much appreciated. >
I have just committed the change in our stable branch [1]. We'll upload the package a bit before the next Debian stable release, if the release team agrees with the changes (which is likely in that case). [1] http://anonscm.debian.org/viewvc/pkg-glibc?view=revision&revision=6227 -- Aurelien Jarno GPG: 4096R/1DDD8C9B [email protected] http://www.aurel32.net -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected] Archive: https://lists.debian.org/[email protected]

