Hi Adam, Thanks for the graphs and the tests, definitely interested to dig a bit deeper to find out what's could be the cause of this.
Do you have the spark driver logs for both runs? Tim On Mon, Nov 30, 2015 at 9:06 AM, Adam McElwee <[email protected]> wrote: > To eliminate any skepticism around whether cpu is a good performance metric > for this workload, I did a couple comparison runs of an example job to > demonstrate a more universal change in performance metrics (stage/job time) > between coarse and fine-grained mode on mesos. > > The workload is identical here - pulling tgz archives from s3, parsing json > lines from the files and ultimately creating documents to index into solr. > The tasks are not inserting into solr (just to let you know that there's no > network side-effect of the map task). The runs are on the same exact > hardware in ec2 (m2.4xlarge, with 68GB of ram and 45G executor memory), > exact same jvm and it's not dependent on order of running the jobs, meaning > I get the same results whether I run the coarse first or whether I run the > fine-grained first. No other frameworks/tasks are running on the mesos > cluster during the test. I see the same results whether it's a 3-node > cluster, or whether it's a 200-node cluster. > > With the CMS collector in fine-grained mode, the map stage takes roughly > 2.9h, and coarse-grained mode takes 3.4h. Because both modes initially start > out performing similarly, the total execution time gap widens as the job > size grows. To put that another way, the difference is much smaller for > jobs/stages < 1 hour. When I submit this job for a much larger dataset that > takes 5+ hours, the difference in total stage time moves closer and closer > to roughly 20-30% longer execution time. > > With the G1 collector in fine-grained mode, the map stage takes roughly > 2.2h, and coarse-grained mode takes 2.7h. Again, the fine and coarse-grained > execution tests are on the exact same machines, exact same dataset, and only > changing spark.mesos.coarse to true/false. > > Let me know if there's anything else I can provide here. > > Thanks, > -Adam > > > On Mon, Nov 23, 2015 at 11:27 AM, Adam McElwee <[email protected]> wrote: >> >> >> >> On Mon, Nov 23, 2015 at 7:36 AM, Iulian Dragoș >> <[email protected]> wrote: >>> >>> >>> >>> On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee <[email protected]> wrote: >>>> >>>> I've used fine-grained mode on our mesos spark clusters until this week, >>>> mostly because it was the default. I started trying coarse-grained because >>>> of the recent chatter on the mailing list about wanting to move the mesos >>>> execution path to coarse-grained only. The odd things is, coarse-grained vs >>>> fine-grained seems to yield drastic cluster utilization metrics for any of >>>> our jobs that I've tried out this week. >>>> >>>> If this is best as a new thread, please let me know, and I'll try not to >>>> derail this conversation. Otherwise, details below: >>> >>> >>> I think it's ok to discuss it here. >>> >>>> >>>> We monitor our spark clusters with ganglia, and historically, we >>>> maintain at least 90% cpu utilization across the cluster. Making a single >>>> configuration change to use coarse-grained execution instead of >>>> fine-grained >>>> consistently yields a cpu utilization pattern that starts around 90% at the >>>> beginning of the job, and then it slowly decreases over the next 1-1.5 >>>> hours >>>> to level out around 65% cpu utilization on the cluster. Does anyone have a >>>> clue why I'd be seeing such a negative effect of switching to >>>> coarse-grained >>>> mode? GC activity is comparable in both cases. I've tried 1.5.2, as well as >>>> the 1.6.0 preview tag that's on github. >>> >>> >>> I'm not very familiar with Ganglia, and how it computes utilization. But >>> one thing comes to mind: did you enable dynamic allocation on coarse-grained >>> mode? >> >> >> Dynamic allocation is definitely not enabled. The only delta between runs >> is adding --conf "spark.mesos.coarse=true" the job submission. Ganglia is >> just pulling stats from the procfs, and I've never seen it report bad >> results. If I sample any of the 100-200 nodes in the cluster, dstat reflects >> the same average cpu that I'm seeing reflected in ganglia. >>> >>> >>> iulian >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
