in memory assumption in cogroup?

2014-09-29 Thread Koert Kuipers
apologies for asking yet again about spark memory assumptions, but i cant seem to keep it in my head. if i use PairRDDFunctions.cogroup, it returns for every key 2 iterables. do the contents of these iterables have to fit in memory? or is the data streamed?

Re: in memory assumption in cogroup?

2014-09-29 Thread Liquan Pei
Hi Koert, cogroup is a transformation on RDD and it creates a cogroupRDD and then perform some transformations on it. When later an action is called, the compute() method of the cogroupRDD will be called. Roughly speaking, each element in cogroupRDD is fetched one at a time. Thus the contents of