Hi all, I'm running Spark 1.2.0, in Stand alone mode, on different cluster and server sizes. All of my data is cached in memory. Basically I have a mass of data, about 8gb, with about 37k of columns, and I'm running different configs of an BinaryLogisticRegressionBFGS. When I put spark to run on 9 servers (1 master and 8 slaves), with 32 cores each. I noticed that the cpu usage was varying from 20% to 50% (counting the cpu usage of 9 servers in the cluster). First I tried to repartition the Rdds to the same number of total client cores (256), but that didn't help. After I've tried to change the property *spark.default.parallelism * to the same number (256) but that didn't helped to increase the cpu usage. Looking at the spark monitoring tool, I saw that some stages took 52s to be completed. My last shot was trying to run some tasks in parallel, but when I start running tasks in parallel (4 tasks) the total cpu time spent to complete this has increased in about 10%, task parallelism didn't helped. Looking at the monitoring tool I've noticed that when running tasks in parallel, the stages complete together, if I have 4 stages running in parallel (A,B,C and D), if A, B and C finishes, they will wait for D to mark all this 4 stages as completed, is that right? Is there any way to improve the cpu usage when running on large servers? Spending more time when running tasks is an expected behaviour?
Kind Regards, Dirceu