There is a leak somewhere in the webserver. It gradually consumes more
memory .. possibly only when actually page hits happen .. until Rackspace
force rebooted it and threatened to take action (like disabling the slice or 
something).
Eventually the slice start paging which impacts other clients using the same 
machine.

I can see this also on my Mac which has 4 times the memory (2GB, Rackspace
slice has 500K only).

I have modified the webserver to include a simple fthread that calls the GC on
a timer. I can now observe two problems:

1. The total number of "still allocated" bytes after a collection grows slowly.
After a few thousand hits, it grow from 16K to about 20K. Bytes. I doubt
felix-lang.org is getting ANY hits at all except when I look at it, which isn't
often since I have a local one.

I also fooled my test by setting the initial memory threshhold to 0.
So the amount of reachable GC memory is slowly growing.
It's not clear how this can happen from thousands of requests to the same page.
There are no logs and no persistent state. After a GC the webserver memory
use should drop to a fixed constant every time: at worse some garbage is 
reachable
but it should be the SAME garbage since I'm fetching the same page.

However the PROCESS memory use is growing much faster. RPRVT as
reported by "top" grows visibly during page loads. It's up to 1764K at the 
moment.
I have no idea what the unit is, I guess 4K pages. But it started under 1600K.
It doesn't go down. I would expect, with regular GC, the actual memory required
would max out eventually.

Now, there's a known issue with the GC particularly with the webserver.
Felix uses C++ strings and the webserver makes a pages as strings.
There's a lot of string concatenation etc.

Dynamically allocated strings in Felix show up as the string control block,
typically in C++ this would be a pointer to the char array and a length 
(although gcc/clang somehow seem to use only a pointer: strings are 8 bytes).
Felix GC knows about such control blocks of 8 or so bytes, but NOT the
char *, which could be large (all the HTML for a page in one string).
So the GC thinks a few K of memory is in use, but the actual memory use
could be thousands of times higher. So it may fail to trigger when it should.

However my timed collection should fix that. Felix destroys C++ strings using
a finaliser which is the C++ string destructor which should release the char 
array.

I don't know the problem. One possibility is this: when fthreads do async I/O,
they get put in a wait queue. The pthread doing the actual event monitoring
releases them back into the ordinary queue for scheduling when a suitable
event occurs.

However that thread basically knows nothing about GC. So what happens is that
the fthread gets turned into a GC root, as if it had been passed to a foreign 
library,
so the collector consider it reachable. After it is put back on the ordinary 
queue,
it is "unrooted" again. That's the theory. Perhaps the unrooting is not working
so after the fthread becomes unreachable it isn't deleted because it is still a 
root.

I was sure hoping there's no fundamental problem with the C++ side of the async
I/O stuff leaking, however the GC can see the leak. Well at least one of them :)


--
john skaller
skal...@users.sourceforge.net
http://felix-lang.org




------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

Reply via email to