Thank for your advise, I have tried your recommended configuration, but SPARK_WORKER_CORES=32 SPARK_WORKER_INSTANCES=1
still work better, in the offical spark-standalone, about the parameter 'SPARK_WORKER_INSTANCES' /You can make this more than 1 if you have have very large machines and would like multiple Spark worker processes./ Maybe our 16 nodes cluster with 16cores, 24GB memory is not fit for more than 1 worker instance. Yi Tian wrote > Hi, myasuka > > Have you checked the jvm gc time of each executor? > > I think you should increase the SPARK_EXECUTOR_CORES or > SPARK_EXECUTOR_INSTANCES until you get the enough concurrency. > > Here is my recommend config: > > SPARK_EXECUTOR_CORES=8 > SPARK_EXECUTOR_INSTANCES=4 > SPARK_WORKER_MEMORY=8G > > note: make sure you got enough memory on each node, more than > SPARK_EXECUTOR_INSTANCES * SPARK_WORKER_MEMORY > > Best Regards, > > Yi Tian > tianyi.asiainfo@ > > > > > On Sep 29, 2014, at 21:06, myasuka < > myasuka@ > > wrote: > >> Our cluster is a standalone cluster with 16 computing nodes, each node >> has 16 >> cores. I set SPARK_WORKER_INSTANCES to 1, and set SPARK_WORKER_CORES to >> 32, >> we give 512 tasks all together, this situation can help increase the >> concurrency. But if I set SPARK_WORKER_INSTANCES to 2, >> SPARK_WORKER_CORES >> to 16, this dosen't work well. >> >> Thank you for your reply. >> >> >> Yi Tian wrote >>> for yarn-client mode: >>> >>> SPARK_EXECUTOR_CORES * SPARK_EXECUTOR_INSTANCES = 2(or 3) * >>> TotalCoresOnYourCluster >>> >>> for standlone mode: >>> >>> SPARK_WORKER_INSTANCES * SPARK_WORKER_CORES = 2(or 3) * >>> TotalCoresOnYourCluster >>> >>> >>> >>> Best Regards, >>> >>> Yi Tian >> >>> tianyi.asiainfo@ >> >>> >>> >>> >>> >>> On Sep 28, 2014, at 17:59, myasuka < >> >>> myasuka@ >> >>> > wrote: >>> >>>> Hi, everyone >>>> I come across with a problem about increasing the concurency. In a >>>> program, after shuffle write, each node should fetch 16 pair matrices >>>> to >>>> do >>>> matrix multiplication. such as: >>>> >>>> *import breeze.linalg.{DenseMatrix => BDM} >>>> >>>> pairs.map(t => { >>>> val b1 = t._2._1.asInstanceOf[BDM[Double]] >>>> val b2 = t._2._2.asInstanceOf[BDM[Double]] >>>> >>>> val c = (b1 * b2).asInstanceOf[BDM[Double]] >>>> >>>> (new BlockID(t._1.row, t._1.column), c) >>>> })* >>>> >>>> Each node has 16 cores. However, no matter I set 16 tasks or more on >>>> each node, the concurrency cannot be higher than 60%, which means not >>>> every >>>> core on the node is computing. Then I check the running log on the >>>> WebUI, >>>> according to the amount of shuffle read and write in every task, I see >>>> some >>>> task do once matrix multiplication, some do twice while some do none. >>>> >>>> Thus, I think of using java multi thread to increase the concurrency. >>>> I >>>> wrote a program in scala which calls java multi thread without Spark on >>>> a >>>> single node, by watch the 'top' monitor, I find this program can use >>>> CPU >>>> up >>>> to 1500% ( means nearly every core are computing). But I have no idea >>>> how >>>> to >>>> use Java multi thread in RDD transformation. >>>> >>>> Is there any one can provide some example code to use Java multi >>>> thread >>>> in RDD transformation, or give any idea to increase the concurrency ? >>>> >>>> Thanks for all >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-use-multi-thread-in-RDD-map-function-tp8583.html >>>> Sent from the Apache Spark Developers List mailing list archive at >>>> Nabble.com. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: >> >>> dev-unsubscribe@.apache >> >>>> For additional commands, e-mail: >> >>> dev-help@.apache >> >>>> >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-use-multi-thread-in-RDD-map-function-tp8583p8594.html >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: > dev-unsubscribe@.apache >> For additional commands, e-mail: > dev-help@.apache -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-use-multi-thread-in-RDD-map-function-tp8583p8616.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org