Re: RAM management during cogroup and join

2015-04-15 Thread Tathagata Das
result visible to the end user does that mean that under the hood there is still the same atrocious RAM consumption going on -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RAM-management-during-cogroup-and-join-tp22505.html Sent from the Apache Spark

RE: RAM management during cogroup and join

2015-04-15 Thread Evo Eftimov
change the total number of elements included in the result RDD and RAM allocated – right? From: Tathagata Das [mailto:t...@databricks.com] Sent: Wednesday, April 15, 2015 9:25 PM To: Evo Eftimov Cc: user Subject: Re: RAM management during cogroup and join Significant optimizations can be made

Re: RAM management during cogroup and join

2015-04-15 Thread Tathagata Das
*Subject:* Re: RAM management during cogroup and join Significant optimizations can be made by doing the joining/cogroup in a smart way. If you have to join streaming RDDs with the same batch RDD, then you can first partition the batch RDDs using a partitions and cache it, and then use

RE: RAM management during cogroup and join

2015-04-15 Thread Evo Eftimov
Subject: Re: RAM management during cogroup and join Agreed. On Wed, Apr 15, 2015 at 1:29 PM, Evo Eftimov evo.efti...@isecc.com wrote: That has been done Sir and represents further optimizations – the objective here was to confirm whether cogroup always results in the previously described

RAM management during cogroup and join

2015-04-15 Thread Evo Eftimov
-management-during-cogroup-and-join-tp22505.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: RAM management during cogroup and join

2015-04-15 Thread Tathagata Das
:* Tathagata Das [mailto:t...@databricks.com] *Sent:* Wednesday, April 15, 2015 9:48 PM *To:* Evo Eftimov *Cc:* user *Subject:* Re: RAM management during cogroup and join Agreed. On Wed, Apr 15, 2015 at 1:29 PM, Evo Eftimov evo.efti...@isecc.com wrote: That has been done Sir

RE: RAM management during cogroup and join

2015-04-15 Thread Evo Eftimov
not that DStreams are some sort of different type of RDDs From: Tathagata Das [mailto:t...@databricks.com] Sent: Wednesday, April 15, 2015 11:11 PM To: Evo Eftimov Cc: user Subject: Re: RAM management during cogroup and join Well, DStream joins are nothing but RDD joins at its core. However