[
https://issues.apache.org/jira/browse/HBASE-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Purtell resolved HBASE-2902.
-----------------------------------
Resolution: Duplicate
Stale issue. Superseded by recent blockcache / bucket cache related work.
> Improve our default shipping GC config. and doc -- along the way do a bit of
> GC myth-busting
> --------------------------------------------------------------------------------------------
>
> Key: HBASE-2902
> URL: https://issues.apache.org/jira/browse/HBASE-2902
> Project: HBase
> Issue Type: Improvement
> Components: Performance
> Reporter: stack
> Attachments: Fragger.java
>
>
> This issue is about improving the near-term story, working with our current
> lot, the slowly evolving (?) 1.6x JVMs and CMS (Longer-term, another issue in
> hbase tracks the G1 story and longer term, Todd is making a bit of traction
> over on the GC hotspot list).
> At the moment we ship with CMS and i-CMS enabled by default. At a minimum,
> i-cms does not apply on most hw hbase is deployed on -- i-cms is for hw w/ 2
> or less processors -- and it seems as though we do not use multiple threads
> doing YG collections; i.e. -XX:UseParNewGC "Use parallel threads in the new
> generation" (Here's what I see...it seems to be off in jdk6 according to
> http://www.md.pp.ru/~eu/jdk6options.html#UseParNewGC but then this says its
> on by default when use CMS ->
> http://blogs.sun.com/jonthecollector/category/Java ... but then this says
> enable it http://www.austinjug.org/presentations/JDK6PerfUpdate_Dec2009.pdf.
> I see this when its enabled: [Rescan (parallel) ... so it seems like its off.
> Need to review the src code).
> We should make the above changes or at least doc them.
> We should consider enabling GC logging by default. Its low cost apparently
> (citation below). We'd just need to do something about the log management.
> Not sure you can roll them -- investigate -- and anyways we should roll on
> startup at least so we don't lose GC logs across restarts.
> We should play with initiating ratios; maybe starting CMS earlier will push
> out the fragmented heap that brings on the killer stop-the-world collection.
> I read somewhere recently that invoking System.gc will run a CMS GC if CMS is
> enabled. We should investigate. If it ran the serial collector, we could at
> least doc. that users could run a defragmenting stop-the-world serial
> collection on 'off' times or at least make it so the stop-the-world happened
> when expected instead of at some random time.
> While here, lets do a bit of myth-busting. Here's a few postulates:
> + Keep the young generation small or at least, cap its size else it grows to
> occupy a large part of the heap
> The above is a Ryanism. Doing the above -- along w/ massive heap size -- has
> put off the fragmentation that others run into at SU at least.
> Interestingly, this document --
> http://www.google.com/url?sa=t&source=web&cd=1&ved=0CBcQFjAA&url=http%3A%2F%2Fmediacast.sun.com%2Fusers%2FLudovic%2Fmedia%2FGCTuningPresentationFISL10.pdf&ei=ZPtaTOiLL5bcsAa7gsl1&usg=AFQjCNHP691SIIE-6NSKccM4mZtm1U6Ahw&sig2=2cjvcaeyn1aISL2THEENjQ
> -- would seem to recommend near the opposite in that it suggests that when
> using CMS, do all you can to keep stuff in the YG. Avoid having stuff age up
> to the tenured heap if you can. This would seem imply using a larger YG.
> Chatting w/ Ryan, the reason to keep the YG small is so we don't have long
> pauses doing YG collections. According to the above citation, its not big
> YGs that cause long YG pauses but the copying of data (not sure if its
> copying of data inside the YG or if it meant copying up to tenured --
> chatting w/ Ryan we thought there'd be no difference -- but we should
> investigate)
> I look a look at a running upload with a small heap admittedly. What I was
> seeing was that using our defaults, rare was anything in YG of age > 1 GC;
> i.e. near everything in YG was being promoted. This may have been a symptom
> of my small (default) heap but we should look into this and try and ensure
> objects are promoted because they are old, not because there is not enough
> space in YG.
> + We should write a slab allocator or allocate memory outside of the JVM heap
> Thinking on this, slab allocator, while a lot of work, I can see it helping
> us w/ block cache, but what if memstore is the fragmented-heap maker? In
> this case, slab-allocator is only part of the fix. It should be easy to see
> which is the fragmented heap maker since we can turn off the cache easy
> enough (though it seems like its accessed anyways even if disabled -- need to
> make sure its not doing allocations to the cache in this case)
> Other things while on this topic. We need to come up w/ a loading that
> brings on the CMS fault that comes of a fragmented heap (CMS is
> non-compacting but apparently it will join together free blocks to make
> bigger ones so there is some anti-fragmenting behavior going on). Apparently
> lots of large irregular sized items is the ticket.
--
This message was sent by Atlassian JIRA
(v6.2#6252)