There is a leak somewhere in the webserver. It gradually consumes more memory .. possibly only when actually page hits happen .. until Rackspace force rebooted it and threatened to take action (like disabling the slice or something). Eventually the slice start paging which impacts other clients using the same machine.
I can see this also on my Mac which has 4 times the memory (2GB, Rackspace slice has 500K only). I have modified the webserver to include a simple fthread that calls the GC on a timer. I can now observe two problems: 1. The total number of "still allocated" bytes after a collection grows slowly. After a few thousand hits, it grow from 16K to about 20K. Bytes. I doubt felix-lang.org is getting ANY hits at all except when I look at it, which isn't often since I have a local one. I also fooled my test by setting the initial memory threshhold to 0. So the amount of reachable GC memory is slowly growing. It's not clear how this can happen from thousands of requests to the same page. There are no logs and no persistent state. After a GC the webserver memory use should drop to a fixed constant every time: at worse some garbage is reachable but it should be the SAME garbage since I'm fetching the same page. However the PROCESS memory use is growing much faster. RPRVT as reported by "top" grows visibly during page loads. It's up to 1764K at the moment. I have no idea what the unit is, I guess 4K pages. But it started under 1600K. It doesn't go down. I would expect, with regular GC, the actual memory required would max out eventually. Now, there's a known issue with the GC particularly with the webserver. Felix uses C++ strings and the webserver makes a pages as strings. There's a lot of string concatenation etc. Dynamically allocated strings in Felix show up as the string control block, typically in C++ this would be a pointer to the char array and a length (although gcc/clang somehow seem to use only a pointer: strings are 8 bytes). Felix GC knows about such control blocks of 8 or so bytes, but NOT the char *, which could be large (all the HTML for a page in one string). So the GC thinks a few K of memory is in use, but the actual memory use could be thousands of times higher. So it may fail to trigger when it should. However my timed collection should fix that. Felix destroys C++ strings using a finaliser which is the C++ string destructor which should release the char array. I don't know the problem. One possibility is this: when fthreads do async I/O, they get put in a wait queue. The pthread doing the actual event monitoring releases them back into the ordinary queue for scheduling when a suitable event occurs. However that thread basically knows nothing about GC. So what happens is that the fthread gets turned into a GC root, as if it had been passed to a foreign library, so the collector consider it reachable. After it is put back on the ordinary queue, it is "unrooted" again. That's the theory. Perhaps the unrooting is not working so after the fthread becomes unreachable it isn't deleted because it is still a root. I was sure hoping there's no fundamental problem with the C++ side of the async I/O stuff leaking, however the GC can see the leak. Well at least one of them :) -- john skaller skal...@users.sourceforge.net http://felix-lang.org ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk _______________________________________________ Felix-language mailing list Felix-language@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/felix-language