I think incorporating https://github.com/amplab/tachyon/wiki is a better solution. I remembered Matei has said that it was in his plan but not sure about the ETA for it to happen.
On Thu, Jan 16, 2014 at 12:30 PM, Mark Hamstra <m...@clearstorydata.com>wrote: > And, of course, there are the bigger-hammer-than-GC-tuning approaches using > some combination of unchecked, off-heap and Tachyon. > > > On Thu, Jan 16, 2014 at 11:54 AM, Tathagata Das < > tathagata.das1...@gmail.com > > wrote: > > > There are a bunch of tricks noted in the Tuning > > Guide< > > http://spark.incubator.apache.org/docs/latest/tuning.html#memory-tuning > >. > > You may have seen them already but I thought its still worth mentioning > for > > the records. > > > > Besides those, if you are concerned about consistent latency (that is, > low > > variability in the job processing times), then using > > concurrent-mark-and-sweep GC is recommended. Instead of big > stop-the-world > > GC pauses, there are many smaller pauses. This reduction in variability > > comes at the cost of processing throughput though, so thats a tradeoff. > > > > TD > > > > > > On Thu, Jan 16, 2014 at 11:35 AM, Kay Ousterhout <k...@eecs.berkeley.edu > > >wrote: > > > > > Hi all, > > > > > > I'm finding that Java GC can be a major performance bottleneck when > > running > > > Spark at high (>50% or so) memory utilization. What GC tuning have > > people > > > tried for Spark and how effective has it been? > > > > > > Thanks! > > > > > > Kay > > > > > > -- Binh Nguyen