I suppose up to this I thought it a given for any java application that wants to do realtime whether a webserver or search application but yeah, we should do more to highlight the import of GC tuning especially when failure to do so can be relatively catastrophic (A RegionServer self-shutting itself down). Ryan in particular has been doing a bunch of talking up of the topic (He did our performance tuning wiki page too). We could start up a list of use cases and the tunings that helped alleviate GC woes for a particular cluster profile and loading (So we'd have something to present at BAHUG? Do you know who we might talk to regards pauses in the MR/HDFS team Patrick? We were introduced to the NameNode Tuner once... we should talk to him again). It does seem to be a problem where one tuning does not suit all deploys.
Regards Zhenyu's case, there is still work to do IMO. What I saw in his logs was a failed promotion from parnew, something that could be helped starting CMS collection earlier (among other things). Hes also still on an older version of the JVM. While things are not timing out at the moment, IMO its still 'broke' if it has such long pauses (Zhenyu, in your GC logs, are you seeing 4 minutes pause?). Ryan would argue these are inevitable with CMS -- but at least in the one case that I saw some twiddling would seem to help. Thanks Patrick, St.Ack
