On 25 Mar 2008, at 15:26, Kirk Haines wrote:
> On Tue, Mar 25, 2008 at 4:40 AM, James Tucker <[EMAIL PROTECTED]>  
> wrote:
>> Forgive me for not having read the whole thread, however, there is  
>> one thing
>> that seems to be really important, and that is, ruby hardly ever  
>> runs the
>> damned GC. It certainly doesn't do full runs nearly often enough  
>> (IMO).
>
> There's only one kind of garbage collection sweep.  And yeah,
> depending on what's happening, GC may not run very often.  That's not
> generally a problem.

Sure, inside ruby there's only one kind of run, but....

>> Also, implicit OOMEs or GC runs quite often DO NOT affect the  
>> extensions
>> correctly. I don't know what rmagick is doing under the hood in  
>> this area,
>> but having been generating large portions of country maps with it  
>> (and
>> moving away from it very rapidly), I know the GC doesn't do "The  
>> Right
>> Thing".
>
> There should be no difference between a GC run that is initiated by
> the interpreter and one that is initiated by one's code.  It ends up
> calling the same thing in gc.c.  Extensions can easily mismanage
> memory, though, and I have a hunch about what's happening with
> rmagick.

I just realised the obvious truth, that ruby isn't actually running  
the GC under those OOME conditions.

>> First call of address is GC_MALLOC_LIMIT and friends. For any small  
>> script
>> that doesn't breach that value, the GC simply doesn't run. More  
>> than this,
>> RMagick, in it's apparent 'wisdom' never frees memory if the GC  
>> never runs.
>> Seriously, check it out. Make a tiny script, and make a huge image  
>> with it.
>> Hell, make 20, get an OOME, and watch for a run of the GC. The OOME  
>> will
>> reach your code before the GC calls on RMagick to free.
>>
>> Now, add a call to GC.start, and no OOME. Despite the limitations  
>> of it
>> (ruby performance only IMO), most of the above experience was built  
>> up on
>> windows, and last usage was about 6 months ago, FYI.
>
> My hunch is that rmagick is allocating large amounts of RAM ouside of
> Ruby.  It registers its objects with the interpreter, but the RAM
> usage in rmagick itself doesn't count against GC_MALLOC_LIMIT because
> Ruby didn't allocate it, so doesn't know about it.

Yup, it's ImageMagick, un-patched and they don't provide afaik a  
callback to replace malloc, or maybe that's an rmagick issue.

> So, it uses huge amounts of RAM, but doesn't use huge numbers of
> objects.  Thus you never trigger a GC cycle by exceeding the
> GC_MALLOC_LIMIT nor by running our of object slots in the heap.  I'd
> have to go look at the code to be sure, but the theory fits the
> behavior that is described very well.

Right, in fact, I think the OOME actually comes from outside of ruby  
(unverified), and ruby can't or won't run the GC before going down. As  
the free() calls inside RMagick / ImageMagick aren't happening without  
calling GC.start. The GC.start call, somewhere/how is being used to  
trigger frees in the framework. Personally, this is bad design, and  
the really common complaints may also suggest so, however, I don't  
know what their domain specific issues and limitations are. Maybe it's  
an ImageMagick thing.

Creating an OOME inside ruby, the interpreter calls on GC.start prior  
to going down. I started talking to zenspider about this stuff, and  
eventually he just pointed me at gc.c, fair enough. I still hold the  
opinion that an OOME hitting the interpreter (from whatever source)  
should attempt to invoke the GC. Of course, a hell of a lot of  
software doesn't check the result of a call to malloc(), tut tut.

Tool: http://ideas.water-powered.com/projects/libgreat

> I don't think this is a case for building GC.foo memory management
> into Mongrel, though.  As I think you are suggesting, just call
> GC.start yourself in your code when necessary.  In a typical Rails app
> doing big things with rmagick, the extra time to do GC.start at the
> end of the image manipulation, in the request handling, isn't going to
> be noticable.

Absolutely right, and yes, this is my opinion.

>> But that's not really the overall point.  My overall point is how to
>> properly handle a rails app that uses a great deal of memory during  
>> each
>> request.  I'm pretty sure this happens in other rails applications  
>> that
>> don't happen to use 'RMagick'.
>>
>> Personally, I'll simply say call the GC more often. Seriously. I  
>> mean it.
>> It's not *that* slow, not at all. In fact, I call GC.start  
>> explicitly inside
>> of by ubygems.rb due to stuff I have observed before:
>
> I completely concur with this.  If there are issues with huge memory
> use (most likely caused by extensions making RAM allocations outside
> of Ruby's accounting, so implicit GC isn't triggered), just call
> GC.start in one's own code.
>
>> Now, by my reckoning (and a few production apps seem to be showing
>> emperically (purely emperical, sorry)) we should be calling on the  
>> GC whilst
>> loading up the apps. I mean come on, when are a really serious  
>> number of
>> temporary objects being created. Actually, it's when rubygems  
>> loads, and
>> that's the first thing that happens in, hmm, probably over 90% of  
>> ruby
>> processes out there.
>
> Just as a tangent, I do this in Swiftiply.  I make an explicit call to
> GC.start after everything is loaded and all configs are parsed, just
> to make sure execution is going into the main event loop with as much
> junk cleaned out as possible.

I've done similar in anything that is running as a fire and forget  
style daemon. You know, the kinds of things that get setup once, and  
run for 1 to 20 years. There are several that I have never restarted.  
No rails, though. These kinds of things I also simply don't want to  
waste the ram to silly fragmentation, the next allocation takes you up  
to a registerable percentage on medium aged machines. IIRC there's one  
in my copy of analogger too, or maybe you had that in there already :-)

>> Or whatever. It doesn't really matter that much where you do this,  
>> or when,
>> it just needs to happen every now and then. More importantly, add a  
>> GC.start
>> to the end of environment.rb, and you will have literally half the  
>> number of
>> objects in ObjectSpace.
>
> This makes sense to me.
>
> I could also see providing a 2nd Rails handler that had some GC
> management stuff in it, along with some documentation on what it
> actually does or does not do, so people can make an explicit choice to
> use it, if they need it.  I'm still completely against throwing code
> into Mongrel itself for this sort of thing.  I just prefer not to
> throw more things into Mongrel than we really _need_ to, when there is
> no strong argument for them being inside of Mongrel itself.  GC.start
> stuff is simple enough to put into one's own code at appropriate
> locations, or to put into a customized Mongrel handler if one needs
> it.

If it wasn't app specific I'd say put it in mongrel. It is though, and  
peoples tendency to pre-optimize probably makes this pointless.

I mean the cost of doing it in a thread under eventmachine is way  
higher than the ram usage costs for pure ruby apps, at least for my  
pure ruby apps. 20-40mb vs. lots of req. / sec.

But then, one could check for better alternatives, like add_timer(),  
etc, but that route tends towards bloat, so your original assertion of  
put it in the app configuration, is what I would choose.

> Maybe this simply needs to be documented in the body of Mongrel  
> documentation?

Maybe not even there. I think research needs to be done into the  
longer running effects of the GC under real environments. I know some  
people have done some (including myself), but the results are never  
released in public. The GC also seems to be one of those topics, as  
it's so close to performance, where people are happy to see how high  
up the wall they can go, prior to doing research.

With regard to mongrel and this stuff, it's really not a mongrel  
issue. Mongrel is a great citizen wrt the GC (at least by comparison  
to a lot of other code).

Particularly bad citizens in this area include:
  - Every single pure ruby pdf lib I've seen
  - rubygems (by way of the spec loading semantics, not rubygems  
itself, kinda (lets just say, I'd do it different, but by design, not  
implementation))
  - rails
  - rmagick

>
>
>
> Kirk Haines
> _______________________________________________
> Mongrel-users mailing list
> Mongrel-users@rubyforge.org
> http://rubyforge.org/mailman/listinfo/mongrel-users

_______________________________________________
Mongrel-users mailing list
Mongrel-users@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-users

Reply via email to