Hello,

I have a spark cluster consisting of 4 nodes in a standalone mode, master +
3 workers nodes with configured available memory and cpus etc.

I have an spark application which is essentially a MLlib pipeline for
training a classifier, in this case RandomForest  but could be a
DecesionTree just for the sake of simplicity.

But when I submit the spark application to the cluster via spark submit it
is running out of memory. Even though the executors are "taken"/created in
the cluster they are esentially doing nothing ( poor cpu, nor memory
utilization) while the master seems to do all the work which finally
results in OOM.

My submission is following:
spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
SparkPOC.jar 10 4.3

I am submitting from the master node.

By default it is running in client mode which the driver process is
attached to spark-shell.

Do I need to set up some settings to make MLlib algos parallelized and
distributed as well or all is driven by parallel factor set on dataframe
with input data?

Essentially it seems that all work is just done on master and the rest is
idle.
Any hints what to check?

Thx
Jakub

Reply via email to