Try mapPartitions, which gives you an iterator, and you can produce an iterator back.
On Tue, Jun 30, 2015 at 11:01 AM, RJ Nowling <rnowl...@gmail.com> wrote: > Hi all, > > I have a problem where I have a RDD of elements: > > Item1 Item2 Item3 Item4 Item5 Item6 ... > > and I want to run a function over them to decide which runs of elements to > group together: > > [Item1 Item2] [Item3] [Item4 Item5 Item6] ... > > Technically, I could use aggregate to do this, but I would have to use a > List of List of T which would produce a very large collection in memory. > > Is there an easy way to accomplish this? e.g.,, it would be nice to have > a version of aggregate where the combination function can return a complete > group that is added to the new RDD and an incomplete group which is passed > to the next call of the reduce function. > > Thanks, > RJ >