Re: [Mongrel] mongrel garbage collection

James Tucker Tue, 25 Mar 2008 03:42:42 -0700

Forgive me for not having read the whole thread, however, there is onething that seems to be really important, and that is, ruby hardly everruns the damned GC. It certainly doesn't do full runs nearly oftenenough (IMO).

Also, implicit OOMEs or GC runs quite often DO NOT affect theextensions correctly. I don't know what rmagick is doing under thehood in this area, but having been generating large portions ofcountry maps with it (and moving away from it very rapidly), I knowthe GC doesn't do "The Right Thing".

First call of address is GC_MALLOC_LIMIT and friends. For any smallscript that doesn't breach that value, the GC simply doesn't run. Morethan this, RMagick, in it's apparent 'wisdom' never frees memory ifthe GC never runs. Seriously, check it out. Make a tiny script, andmake a huge image with it. Hell, make 20, get an OOME, and watch for arun of the GC. The OOME will reach your code before the GC calls onRMagick to free.

Now, add a call to GC.start, and no OOME. Despite the limitations ofit (ruby performance only IMO), most of the above experience was builtup on windows, and last usage was about 6 months ago, FYI.


On 24 Mar 2008, at 20:37, Luis Lavena wrote:

On Mon, Mar 24, 2008 at 4:59 PM, Scott Windsor <[EMAIL PROTECTED]>wrote:
On Mon, Mar 24, 2008 at 12:18 PM, Luis Lavena<[EMAIL PROTECTED]> wrote:
On Mon, Mar 24, 2008 at 3:58 PM, Scott Windsor<[EMAIL PROTECTED]> wrote:
You're using *RMagick*, not ImageMagick directly. If you used the
later (via system calls) there will no be memory leakage you canworry
about.
You're correct - I'm using 'RMagick' - and it uses a large amountof memory.
But that's not really the overall point.  My overall point is how to
properly handle a rails app that uses a great deal of memory duringeachrequest. I'm pretty sure this happens in other rails applicationsthat
don't happen to use 'RMagick'.

Personally, I'll simply say call the GC more often. Seriously. I meanit. It's not *that* slow, not at all. In fact, I call GC.startexplicitly inside of by ubygems.rb due to stuff I have observed before:

http://blog.ra66i.org/archives/informatics/2007/10/05/calling-on-the-gc-after-rubygems/- N.B. This isn't "FIXED" it's still a good idea (gem 1.0.1).

http://zdavatz.wordpress.com/2007/07/18/heap-fragmentation-in-a-long-running-ruby-process/

Now, by my reckoning (and a few production apps seem to be showingemperically (purely emperical, sorry)) we should be calling on the GCwhilst loading up the apps. I mean come on, when are a really seriousnumber of temporary objects being created. Actually, it's whenrubygems loads, and that's the first thing that happens in, hmm,probably over 90% of ruby processes out there.


Yes, I faced huge memory usage issues with other things non related to
image processing and found that a good thing was move them out of the
request-response cycle and into a out-of-bound background job.

So far, running the GC under fastcgi has given me pretty goodresults. Thezombing issue with fast cgi is a known issue with mod_fastcgi andI'm pretty
sure unrelated to RMagick or garbage collection.


Yes, but even you "reclaim" the memory with GC, there will be pieces
that wouldn't be GC'ed ever, since the leaked in the C side, outside
GC control (some of the RMagick and ImageMagick mysteries).

Sure, but leaks are odd things. Some processes that appear to beleaking are really just fragmenting (allocating more ram due to lackof 'usable' space on 'the heap'. Call the GC more often, take a 0.01%performance hit, and monitor. I bet it'll get better. In fact, you candrop fragmentation the first allocated segment significantly just bycalling GC.start after a rubygems load, if you have more than a fewgems.

Can you tell me how you addressed the "schedule" of the garbage
collection execution on your previous scenario? AFAIK most of the
frameworks or servers don't impose to the user how often GC shouldbe
performed.

In fact there are many rubyists who hate the idea of splattingGC.start into processes. Given what I've seen, I'm willing to rejectthat notion completely. Test yourself, YMMV.

FYI, even on windows under the OCI, where performance for theinterpreter sucks, really really hard, I couldn't reliably measure theruntime of a call to GC.start after loading rubygems. I don't knowwhat kind of 'performance' people are after, but I can't see the pointin not running the GC more often, especially for 'more common' daemonload. Furthermore, hitting the kernel for more allocations more often,is actually pretty slow too, so this may actually even result infaster processes under *certain* conditions.

Running a lib like RMagick, I would say you *should* be doing this,straight up, no arguments.

In the previous scenario I was using fast_cgi with rails. In myprevious
reply I provided a link to the rails fastcgi dispatcher.

http://dev.rubyonrails.org/browser/trunk/railties/dispatches/dispatch.fcgi
In addtion, in other languages and other language web frameworksthere areprovisions to control garbage collection (for languages that havegarbage
collections, of course).
I'll bet is rails specific, or you should take a look at the fcgiruby
extension, since it is responsible, ruby-side, of bridging both
worlds.
This is done in the Rails FastCGI dispatcher. I believe that theequivalentof this in Mongrel is the Mongrel Rails dispatcher. Since theMongrel Railsdispatcher is distributed as a part of Mongrel, I'd say this codeis owned
by Mongrel, which bridges these two worlds when using mongrel as a
webserver.

It doesn't *really* matter where you run the GC. It matters that itruns, how often, and what it's doing. If you're actually calling onthe GC and freeing nothing, that's stupid, but if you've run RMagickup, just call GC.start anyway, and I'm pretty sure it'll help. There'scertainly no harm in investigating this, unless you're doing somethingsilly with weakrefs.

Then you could provide a different Mongrel Handler that could perform
that, or even a series of GemPlugins that provide a gc:start instead
of plain 'start' command mongrel_rails scripts provides.



$occasional_gc_run_counter = 0
before_filter :occasional_gc_run

def occasional_gc_run
  $occasional_gc_run_counter += 1
  if $occasional_gc_run_counter > 1_000
    $occasional_gc_run_counter = 0
    GC.start
  end
end

Or whatever. It doesn't really matter that much where you do this, orwhen, it just needs to happen every now and then. More importantly,add a GC.start to the end of environment.rb, and you will haveliterally half the number of objects in ObjectSpace.

On a personal note, I believe is not responsibility of Mongrel, as a
webserver, take care of the garbage collection and leakage issues of
the Vm on which your application runs. In any case, the GC of the VM

(MRI Ruby) should be enhanced to work better with heavy load andlong

running environments.

Right, and it's not just the interpreter, although indirection aroundthis stuff can help. (such as compacting).

Ruby provides an API to access and call the Garbage Collector.  This
provides ruby application developers the ability to control whenthe garbage
collection is run because in some cases, there may be an
application-specific reason to prevent or explicity run the GC.Web serversare a good example of these applications where state may helpdetermine abetter time to run the GC. As you're serving each request, you'regenerallyallocating a number of objects, then rendering output, then movingon to the
next request.
By limiting the GC to run in between requests rather than duringrequests
you are trading request time for latency between requests.  This is a
trade-off that I think web application developers should deciede,but by nomeans should this be a default or silver bullet for all. Myposition is
that this just be an option within Mongrel as a web server.

Right, I think this is important too. You're absolutely right thatthere's no specific place to provide a generic solution. In rails theanswer may be simple, but that's because rails outer architecture issimplistic. No threads, no out-of-request processing, and so on.


--gc-interval maybe?

Now that you convinced me and proved your point, having the option to
perform it (optionally, not forced) will be something good to have.


Surely you can just:

require 'thread'
Thread.new { loop { sleep GC_FORCE_INTERVAL; GC.start } }

In environment.rb in that case.

Of course, this is going to kill performance under evented_mongrel,thin and so on. I'd stay away from threaded solutions. _why bloggedyears ago about the GC, trying to remind people that we actually havecontrol. I know ruby is supposed to abstract memory problems etc awayfrom us, and for the most part it does, but hey, no one's perfect,right? :-)


http://whytheluckystiff.net/articles/theFullyUpturnedBin.html

Patches are Welcome ;-)


Have fun! :o)

_______________________________________________
Mongrel-users mailing list
Mongrel-users@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-users

Re: [Mongrel] mongrel garbage collection

Reply via email to