result visible to the end user does that mean that under
the hood there is still the same atrocious RAM consumption going on
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/RAM-management-during-cogroup-and-join-tp22505.html
Sent from the Apache Spark
change the total number of elements
included in the result RDD and RAM allocated – right?
From: Tathagata Das [mailto:t...@databricks.com]
Sent: Wednesday, April 15, 2015 9:25 PM
To: Evo Eftimov
Cc: user
Subject: Re: RAM management during cogroup and join
Significant optimizations can be made
*Subject:* Re: RAM management during cogroup and join
Significant optimizations can be made by doing the joining/cogroup in a
smart way. If you have to join streaming RDDs with the same batch RDD, then
you can first partition the batch RDDs using a partitions and cache it, and
then use
Subject: Re: RAM management during cogroup and join
Agreed.
On Wed, Apr 15, 2015 at 1:29 PM, Evo Eftimov evo.efti...@isecc.com wrote:
That has been done Sir and represents further optimizations – the objective
here was to confirm whether cogroup always results in the previously described
-management-during-cogroup-and-join-tp22505.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
:* Tathagata Das [mailto:t...@databricks.com]
*Sent:* Wednesday, April 15, 2015 9:48 PM
*To:* Evo Eftimov
*Cc:* user
*Subject:* Re: RAM management during cogroup and join
Agreed.
On Wed, Apr 15, 2015 at 1:29 PM, Evo Eftimov evo.efti...@isecc.com
wrote:
That has been done Sir
not
that DStreams are some sort of different type of RDDs
From: Tathagata Das [mailto:t...@databricks.com]
Sent: Wednesday, April 15, 2015 11:11 PM
To: Evo Eftimov
Cc: user
Subject: Re: RAM management during cogroup and join
Well, DStream joins are nothing but RDD joins at its core. However