Hi Xuelin,

this type of question is probably better asked on the spark-user mailing
list, u...@spark.apache.org
<http://apache-spark-user-list.1001560.n3.nabble.com>

Do you mean the very first set of tasks take 300 - 500 ms to deserialize?
That is most likely because of the time taken to ship the jars from the
driver to the executors.  You should only pay this cost once per spark
context (assuming you are not adding more jars later on).  You could try
simply running the same task again, from the same spark context, and see
whether it still takes that much time to deserialize the tasks.

If you really want to eliminate that initial time to send the jars, you
could ensure that the jars are already on the executors, so they don't need
to get sent at all by spark.   (Of course, this makes it harder to deploy
new code; you'd still need to update those jars *somehow* when you do.)

hope this helps,
Imran


On Sat, Nov 22, 2014 at 6:52 AM, Xuelin Cao <xuelin...@yahoo.com.invalid>
wrote:

>
> In our experimental cluster (1 driver, 5 workers), we tried the simplest
> example:   sc.parallelize(Range(0, 100), 2).count
>
> In the event log, we found the executor takes too much time on
> deserialization, about 300 ~ 500ms, and the execution time is only 1ms.
>
> Our servers are with 2.3G Hz CPU * 24 cores.  And, we have set the
> serializer to org.apache.spark.serializer.KryoSerializer .
>
> The question is, is it normal that the executor takes 300~500ms on
> deserialization?  If not, any clue for the performance tuning?
>
>
>

Reply via email to