Jim Davidson wrote:
>
> Howdy,
>
> The leak is more precisely bloat, i.e., Tcl interps that create more
> and more objects and [...]   The memory is not
> actually "returned" via a munmap, e.g..  Nate Folkman and I spent
> tons of time trying to figure this out years ago with all sorts of
> coalescing, unmapping, etc. fun but failed.  It's possible other
> memory allocators have the same or better performance in speed and
> space today.

Things may be a bit better today, as (I believe) the core tcl allocator 
has a separate pool specifically for Tcl_Objs, which are a extremely 
common small allocation.

I think the 'vtamalloc' allocator by Zoran Vasiljevec (?) was 
specifically written to munmap/deallocate memory so that it could be 
reclaimed by the system and keep the high-water mark down.  It requires 
you to build tcl with nonstandard defines tho, so you don't get the 
standard tcl threaded allocator, which I think is a direct derivative of 
zippy.

> Anyway, what's the race condition?  Curious about that one.

I'm going off memory, because looking at the code it seems that it 
shouldn't happen, but it was and my change fixed what I was seeing.

The problem is that a pthreads thread starts to run immediately upon 
creation, and if maxconns was set too low then the conn thread could run 
and exit before Ns_ThreadCreate (which is more or less a thin wrapper 
around pthread_create) ran to completion, adding itself to the list of 
threads to be reaped, and the next thread to be created would reap the 
dead threads list but the thread id never got written in 
Ns_ThreadCreate.  I could only reproduce the error with maxconns less 
than about 10, and running a benchmark like 'ab' with a fast request 
like a fastpath file.  pthread_join would be called with a null tid, and 
the server would segfault.

I "fixed" this by passing the Ns_Thread* passed to Ns_ThreadCreate 
directly through to pthread_create; on linux at least the tid is written 
to the pthread_t in pthread_create before the new thread starts 
running, but POSIX offers no such guarantee, so my "fix" might not work 
on solaris or elsewhere (I only dug into so many library files).

I'm confusing myself now tho, because it certainly looks like the 
threads are only ever reaped by the driver thread, which should 
absolutely finish Ns_ThreadCreate before it can call it again.  Conn 
threads create a new thread to replace themselves when they hit 
maxconns, but they don't reap at that time, so it should be ok.

Either way, I had easily reproducible segfaults, and tweaking the thread 
code eliminated them, so I'm pretty sure I saw something real :)

-J

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
aolserver-talk mailing list
aolserver-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aolserver-talk

Reply via email to