Hi Xuelin, this type of question is probably better asked on the spark-user mailing list, u...@spark.apache.org <http://apache-spark-user-list.1001560.n3.nabble.com>
Do you mean the very first set of tasks take 300 - 500 ms to deserialize? That is most likely because of the time taken to ship the jars from the driver to the executors. You should only pay this cost once per spark context (assuming you are not adding more jars later on). You could try simply running the same task again, from the same spark context, and see whether it still takes that much time to deserialize the tasks. If you really want to eliminate that initial time to send the jars, you could ensure that the jars are already on the executors, so they don't need to get sent at all by spark. (Of course, this makes it harder to deploy new code; you'd still need to update those jars *somehow* when you do.) hope this helps, Imran On Sat, Nov 22, 2014 at 6:52 AM, Xuelin Cao <xuelin...@yahoo.com.invalid> wrote: > > In our experimental cluster (1 driver, 5 workers), we tried the simplest > example: sc.parallelize(Range(0, 100), 2).count > > In the event log, we found the executor takes too much time on > deserialization, about 300 ~ 500ms, and the execution time is only 1ms. > > Our servers are with 2.3G Hz CPU * 24 cores. And, we have set the > serializer to org.apache.spark.serializer.KryoSerializer . > > The question is, is it normal that the executor takes 300~500ms on > deserialization? If not, any clue for the performance tuning? > > >