Use below configuration if u r using 1.2 version:-
SET spark.shuffle.consolidateFiles=true;
SET spark.rdd.compress=true;
SET spark.default.parallelism=1000;
SET spark.deploy.defaultCores=54;
Thanks
Puneet.
-Original Message-
From: Sean Owen [mailto:so...@cloudera.com]
Sent: Friday, February 13, 2015 4:46 PM
To: Igor Petrov
Cc: user@spark.apache.org
Subject: Re: Tuning number of partitions per CPU
18 cores or 36? doesn't probably matter.
For this case where you have some overhead per partition of setting up the DB
connection, it may indeed not help to chop up the data more finely than your
total parallelism. Although that would imply quite an overhead. Are you doing
any other expensive initialization per partition in your code?
You might check some other basic things, like, are you bottlenecked on the DB
(probably not) and are there task stragglers drawing out the completion time.
On Fri, Feb 13, 2015 at 11:06 AM, Igor Petrov igorpetrov...@gmail.com wrote:
Hello,
In Spark programming guide
(http://spark.apache.org/docs/1.2.0/programming-guide.html) there is a
recommendation:
Typically you want 2-4 partitions for each CPU in your cluster.
We have a Spark Master and two Spark workers each with 18 cores and 18
GB of RAM.
In our application we use JdbcRDD to load data from a DB and then cache it.
We load entities from a single table, now we have 76 million of
entities (entity size in memory is about 160 bytes). We call count()
during application startup to force entities loading. Here are our
measurements for
count() operation (cores x partitions = time):
36x36 = 6.5 min
36x72 = 7.7 min
36x108 = 9.4 min
So despite recommendations the most efficient setup is one partition
per core. What is the reason for above recommendation?
Java 8, Apache Spark 1.1.0
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Tuning-number-of-p
artitions-per-CPU-tp21642.html Sent from the Apache Spark User List
mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For
additional commands, e-mail: user-h...@spark.apache.org
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
commands, e-mail: user-h...@spark.apache.org