Hi Xiangrui, I wasn't setting spark.driver.memory. I'll try that and report back.
In terms of this running on the cluster, my assumption was that calling foreach on an array(I converted samples using toArray) would mean counts is propagated locally. The object would then be serialized to executors fully propagated. Is this correct? I'm actually trying to load a trie and used the hashmap as an example of loading data into an object that needs to be serialized. Is there a better way of doing this? - jerry > On Aug 15, 2014, at 8:36 AM, "Xiangrui Meng [via Apache Spark Developers > List]" <ml-node+s1001551n7866...@n3.nabble.com> wrote: > > Did you set driver memory? You can confirm it in the Executors tab of > the WebUI. Btw, the code may only work in local mode. In a cluster > mode, counts will be serialized to remote workers and the result is > not fetched by the driver after foreach. You can use RDD.countByValue > instead. -Xiangrui > > On Fri, Aug 15, 2014 at 8:18 AM, jerryye <[hidden email]> wrote: > > > Hi All, > > I'm not sure if I should file a JIRA or if I'm missing something obvious > > since the test code I'm trying is so simple. I've isolated the problem I'm > > seeing to a memory issue but I don't know what parameter I need to tweak, > > it > > does seem related to spark.akka.frameSize. If I sample my RDD with 35% of > > the data, everything runs to completion, with more than 35%, it fails. In > > standalone mode, I can run on the full RDD without any problems. > > > > // works > > val samples = sc.textFile("s3n://geonames").sample(false,0.35) // 64MB, > > 2849439 Lines > > > > // fails > > val samples = sc.textFile("s3n://geonames").sample(false,0.4) // 64MB, > > 2849439 Lines > > > > Any ideas? > > > > 1) RDD size is causing the problem. The code below as is fails but if I > > swap > > smallSample for samples, the code runs end to end on both cluster and > > standalone. > > 2) The error I get is: > > rg.apache.spark.SparkException: Job aborted due to stage failure: Task > > 3.0:1 > > failed 4 times, most recent failure: TID 12 on host > > ip-10-251-14-74.us-west-2.compute.internal failed for unknown reason > > Driver stacktrace: > > at > > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044) > > > > at > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028) > > > > 3) Using the 1.1.0 branch the driver freezes instead of aborting with the > > previous error in #2. > > 4) In 1.1.0, changing spark.akka.frameSize also has the effect of no > > progress in the driver. > > > > Code: > > val smallSample = sc.parallelize(Array("foo word", "bar word", "baz word")) > > > > val samples = sc.textFile("s3n://geonames") // 64MB, 2849439 Lines of short > > strings > > > > val counts = new collection.mutable.HashMap[String, > > Int].withDefaultValue(0) > > > > samples.toArray.foreach(counts(_) += 1) > > > > val result = samples.map( > > l => (l, counts.get(l)) > > ) > > > > result.count > > > > Settings (with or without Kryo doesn't matter): > > export SPARK_JAVA_OPTS="-Xms5g -Xmx10g -XX:MaxPermSize=10g" > > export SPARK_MEM=10g > > spark.akka.frameSize 40 > > #spark.serializer org.apache.spark.serializer.KryoSerializer > > #spark.kryoserializer.buffer.mb 1000 > > spark.executor.memory 58315m > > spark.executor.extraLibraryPath /root/ephemeral-hdfs/lib/native/ > > spark.executor.extraClassPath /root/ephemeral-hdfs/conf > > > > > > > > -- > > View this message in context: > > http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865.html > > Sent from the Apache Spark Developers List mailing list archive at > > Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [hidden email] > > For additional commands, e-mail: [hidden email] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > > > > If you reply to this email, your message will be added to the discussion > below: > http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7866.html > To start a new topic under Apache Spark Developers List, email > ml-node+s1001551n1...@n3.nabble.com > To unsubscribe from spark.akka.frameSize stalls job in 1.1.0, click here. > NAML -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7871.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.