Just saw you used toArray on an RDD. That copies all data to the driver and it is deprecated. countByValue is what you need:
val samples = sc.textFile("s3n://geonames") val counts = samples.countByValue() val result = samples.map(l => (l, counts.getOrElse(l, 0L)) Could you also try to use the latest branch-1.1 or master with the default akka.frameSize setting? The serialized task size should be small because we now use broadcast RDD objects. -Xiangrui On Fri, Aug 15, 2014 at 5:11 PM, jerryye <jerr...@gmail.com> wrote: > Hi Xiangrui, > You were right, I had to use --driver_memory instead of setting it in > spark-defaults.conf. > > However, now my just hangs with the following message: > 4/08/15 23:54:46 INFO scheduler.TaskSetManager: Serialized task 1.0:0 as > 29433434 bytes in 202 ms > 14/08/15 23:54:46 INFO scheduler.TaskSetManager: Starting task 1.0:1 as TID > 3 on executor 1: ip-10-226-198-31.us-west-2.compute.internal (PROCESS_LOCAL) > 14/08/15 23:54:46 INFO scheduler.TaskSetManager: Serialized task 1.0:1 as > 29433434 bytes in 203 ms > > Any ideas on where else to look? > > > On Fri, Aug 15, 2014 at 3:29 PM, Xiangrui Meng [via Apache Spark Developers > List] <ml-node+s1001551n7883...@n3.nabble.com> wrote: > >> Did you verify the driver memory in the Executor tab of the WebUI? I >> think you need `--driver-memory 8g` with spark-shell or spark-submit >> instead of setting it in spark-defaults.conf. >> >> On Fri, Aug 15, 2014 at 12:41 PM, jerryye <[hidden email] >> <http://user/SendEmail.jtp?type=node&node=7883&i=0>> wrote: >> >> > Setting spark.driver.memory has no effect. It's still hanging trying to >> > compute result.count when I'm sampling greater than 35% regardless of >> what >> > value of spark.driver.memory I'm setting. >> > >> > Here's my settings: >> > export SPARK_JAVA_OPTS="-Xms5g -Xmx10g -XX:MaxPermSize=10g" >> > export SPARK_MEM=10g >> > >> > in conf/spark-defaults: >> > spark.driver.memory 1500 >> > spark.serializer org.apache.spark.serializer.KryoSerializer >> > spark.kryoserializer.buffer.mb 500 >> > spark.executor.memory 58315m >> > spark.executor.extraLibraryPath /root/ephemeral-hdfs/lib/native/ >> > spark.executor.extraClassPath /root/ephemeral-hdfs/conf >> > >> > >> > >> > -- >> > View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7877.html >> >> > Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: [hidden email] >> <http://user/SendEmail.jtp?type=node&node=7883&i=1> >> > For additional commands, e-mail: [hidden email] >> <http://user/SendEmail.jtp?type=node&node=7883&i=2> >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [hidden email] >> <http://user/SendEmail.jtp?type=node&node=7883&i=3> >> For additional commands, e-mail: [hidden email] >> <http://user/SendEmail.jtp?type=node&node=7883&i=4> >> >> >> >> ------------------------------ >> If you reply to this email, your message will be added to the discussion >> below: >> >> http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7883.html >> To start a new topic under Apache Spark Developers List, email >> ml-node+s1001551n1...@n3.nabble.com >> To unsubscribe from spark.akka.frameSize stalls job in 1.1.0, click here >> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=7865&code=amVycnl5ZUBnbWFpbC5jb218Nzg2NXwtNTI4OTc1MTAz> >> . >> NAML >> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >> > > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7886.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org