broadcast: OutOfMemoryError

2014-12-11 Thread ll
hi.  i'm running into this OutOfMemory issue when i'm broadcasting a large
array.  what is the best way to handle this?

should i split the array into smaller arrays before broadcasting, and then
combining them locally at each node?

thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/broadcast-OutOfMemoryError-tp20633.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: broadcast: OutOfMemoryError

2014-12-11 Thread Sameer Farooqui
Is the OOM happening to the Driver JVM or one of the Executor JVMs? What
memory size is each JVM?

How large is the data you're trying to broadcast? If it's large enough, you
may want to consider just persisting the data to distributed storage (like
HDFS) and read it in through the normal read RDD methods like sc.textFile().

Maybe someone else can comment on what the largest recommended data
collection sizes are to use with Broadcast...



On Thu, Dec 11, 2014 at 10:14 AM, ll duy.huynh@gmail.com wrote:

 hi.  i'm running into this OutOfMemory issue when i'm broadcasting a large
 array.  what is the best way to handle this?

 should i split the array into smaller arrays before broadcasting, and then
 combining them locally at each node?

 thanks!



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/broadcast-OutOfMemoryError-tp20633.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org