It sounds like your computation just isn't CPU bound, right? or maybe that only some stages are. It's not clear what work you are doing beyond the core LR.
Stages don't wait on each other unless one depends on the other. You'd have to clarify what you mean by running stages in parallel, like what are the interdependencies. On Fri, Feb 20, 2015 at 10:01 AM, Dirceu Semighini Filho <dirceu.semigh...@gmail.com> wrote: > Hi all, > I'm running Spark 1.2.0, in Stand alone mode, on different cluster and > server sizes. All of my data is cached in memory. > Basically I have a mass of data, about 8gb, with about 37k of columns, and > I'm running different configs of an BinaryLogisticRegressionBFGS. > When I put spark to run on 9 servers (1 master and 8 slaves), with 32 cores > each. I noticed that the cpu usage was varying from 20% to 50% (counting > the cpu usage of 9 servers in the cluster). > First I tried to repartition the Rdds to the same number of total client > cores (256), but that didn't help. After I've tried to change the > property *spark.default.parallelism > * to the same number (256) but that didn't helped to increase the cpu usage. > Looking at the spark monitoring tool, I saw that some stages took 52s to > be completed. > My last shot was trying to run some tasks in parallel, but when I start > running tasks in parallel (4 tasks) the total cpu time spent to complete > this has increased in about 10%, task parallelism didn't helped. > Looking at the monitoring tool I've noticed that when running tasks in > parallel, the stages complete together, if I have 4 stages running in > parallel (A,B,C and D), if A, B and C finishes, they will wait for D to > mark all this 4 stages as completed, is that right? > Is there any way to improve the cpu usage when running on large servers? > Spending more time when running tasks is an expected behaviour? > > Kind Regards, > Dirceu --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org