Just saw you used toArray on an RDD. That copies all data to the
driver and it is deprecated. countByValue is what you need:

val samples = sc.textFile("s3n://geonames")
val counts = samples.countByValue()
val result = samples.map(l => (l, counts.getOrElse(l, 0L))

Could you also try to use the latest branch-1.1 or master with the
default akka.frameSize setting? The serialized task size should be
small because we now use broadcast RDD objects.

-Xiangrui

On Fri, Aug 15, 2014 at 5:11 PM, jerryye <jerr...@gmail.com> wrote:
> Hi Xiangrui,
> You were right, I had to use --driver_memory instead of setting it in
> spark-defaults.conf.
>
> However, now my just hangs with the following message:
> 4/08/15 23:54:46 INFO scheduler.TaskSetManager: Serialized task 1.0:0 as
> 29433434 bytes in 202 ms
> 14/08/15 23:54:46 INFO scheduler.TaskSetManager: Starting task 1.0:1 as TID
> 3 on executor 1: ip-10-226-198-31.us-west-2.compute.internal (PROCESS_LOCAL)
> 14/08/15 23:54:46 INFO scheduler.TaskSetManager: Serialized task 1.0:1 as
> 29433434 bytes in 203 ms
>
> Any ideas on where else to look?
>
>
> On Fri, Aug 15, 2014 at 3:29 PM, Xiangrui Meng [via Apache Spark Developers
> List] <ml-node+s1001551n7883...@n3.nabble.com> wrote:
>
>> Did you verify the driver memory in the Executor tab of the WebUI? I
>> think you need `--driver-memory 8g` with spark-shell or spark-submit
>> instead of setting it in spark-defaults.conf.
>>
>> On Fri, Aug 15, 2014 at 12:41 PM, jerryye <[hidden email]
>> <http://user/SendEmail.jtp?type=node&node=7883&i=0>> wrote:
>>
>> > Setting spark.driver.memory has no effect. It's still hanging trying to
>> > compute result.count when I'm sampling greater than 35% regardless of
>> what
>> > value of spark.driver.memory I'm setting.
>> >
>> > Here's my settings:
>> > export SPARK_JAVA_OPTS="-Xms5g -Xmx10g -XX:MaxPermSize=10g"
>> > export SPARK_MEM=10g
>> >
>> > in conf/spark-defaults:
>> > spark.driver.memory 1500
>> > spark.serializer org.apache.spark.serializer.KryoSerializer
>> > spark.kryoserializer.buffer.mb 500
>> > spark.executor.memory 58315m
>> > spark.executor.extraLibraryPath /root/ephemeral-hdfs/lib/native/
>> > spark.executor.extraClassPath /root/ephemeral-hdfs/conf
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7877.html
>>
>> > Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [hidden email]
>> <http://user/SendEmail.jtp?type=node&node=7883&i=1>
>> > For additional commands, e-mail: [hidden email]
>> <http://user/SendEmail.jtp?type=node&node=7883&i=2>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> <http://user/SendEmail.jtp?type=node&node=7883&i=3>
>> For additional commands, e-mail: [hidden email]
>> <http://user/SendEmail.jtp?type=node&node=7883&i=4>
>>
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7883.html
>>  To start a new topic under Apache Spark Developers List, email
>> ml-node+s1001551n1...@n3.nabble.com
>> To unsubscribe from spark.akka.frameSize stalls job in 1.1.0, click here
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=7865&code=amVycnl5ZUBnbWFpbC5jb218Nzg2NXwtNTI4OTc1MTAz>
>> .
>> NAML
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7886.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to