I just want to pitch in and say that I ran into the same problem with running with 64GB executors. For example, some of the tasks take 5 minutes to execute, out of which 4 minutes are spent in GC. I'll try out smaller executors.
On Mon, Oct 6, 2014 at 6:35 PM, Otis Gospodnetic <otis.gospodne...@gmail.com > wrote: > Hi, > > The other option to consider is using G1 GC, which should behave better > with large heaps. But pointers are not compressed in heaps > 32 GB in > size, so you may be better off staying under 32 GB. > > Otis > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > > On Mon, Oct 6, 2014 at 8:08 PM, Mingyu Kim <m...@palantir.com> wrote: > >> Ok, cool. This seems to be general issues in JVM with very large heaps. I >> agree that the best workaround would be to keep the heap size below 32GB. >> Thanks guys! >> >> Mingyu >> >> From: Arun Ahuja <aahuj...@gmail.com> >> Date: Monday, October 6, 2014 at 7:50 AM >> To: Andrew Ash <and...@andrewash.com> >> Cc: Mingyu Kim <m...@palantir.com>, "user@spark.apache.org" < >> user@spark.apache.org>, Dennis Lawler <dlaw...@palantir.com> >> Subject: Re: Larger heap leads to perf degradation due to GC >> >> We have used the strategy that you suggested, Andrew - using many workers >> per machine and keeping the heaps small (< 20gb). >> >> Using a large heap resulted in workers hanging or not responding (leading >> to timeouts). The same dataset/job for us will fail (most often due to >> akka disassociated or fetch failures errors) with 10 cores / 100 executors, >> 60 gb per executor while succceed with 1 core / 1000 executors / 6gb per >> executor. >> >> When the job does succceed with more cores per executor and larger heap >> it is usually much slower than the smaller executors (the same 8-10 min job >> taking 15-20 min to complete) >> >> The unfortunate downside of this has been, we have had some large >> broadcast variables which may not fit into memory (and unnecessarily >> duplicated) when using the smaller executors. >> >> Most of this is anecdotal but for the most part we have had more success >> and consistency with more executors with smaller memory requirements. >> >> On Sun, Oct 5, 2014 at 7:20 PM, Andrew Ash <and...@andrewash.com> wrote: >> >>> Hi Mingyu, >>> >>> Maybe we should be limiting our heaps to 32GB max and running multiple >>> workers per machine to avoid large GC issues. >>> >>> For a 128GB memory, 32 core machine, this could look like: >>> >>> SPARK_WORKER_INSTANCES=4 >>> SPARK_WORKER_MEMORY=32 >>> SPARK_WORKER_CORES=8 >>> >>> Are people running with large (32GB+) executor heaps in production? I'd >>> be curious to hear if so. >>> >>> Cheers! >>> Andrew >>> >>> On Thu, Oct 2, 2014 at 1:30 PM, Mingyu Kim <m...@palantir.com> wrote: >>> >>>> This issue definitely needs more investigation, but I just wanted to >>>> quickly check if anyone has run into this problem or has general guidance >>>> around it. We’ve seen a performance degradation with a large heap on a >>>> simple map task (I.e. No shuffle). We’ve seen the slowness starting around >>>> from 50GB heap. (I.e. spark.executor.memoty=50g) And, when we checked the >>>> CPU usage, there were just a lot of GCs going on. >>>> >>>> Has anyone seen a similar problem? >>>> >>>> Thanks, >>>> Mingyu >>>> >>> >>> >> >