RDD.toLocalIterator return the partition one by one but with all elements in
the partition, which is not lazy calculated. Given the design of spark, it is
very hard to maintain the state of iterator across runJob.
def toLocalIterator: Iterator[T] = {
def collectPartition(p: Int): Array[T]
Call RDD.toLocalIterator()?
https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html
On Wed, Oct 29, 2014 at 4:15 AM, Dai, Kevin yun...@ebay.com wrote:
Hi, ALL
I have a RDD[T], can I use it like a iterator.
That means I can compute every element of this RDD lazily.
RDD.toLocalIterator() is the suitable solution.
But I doubt whether it conform with the design principle of spark and RDD.
All RDD transform is lazily computed until it end with some actions.
2014-10-29 15:28 GMT+08:00 Sean Owen so...@cloudera.com:
Call RDD.toLocalIterator()?
Hi, ALL
I have a RDD[T], can I use it like a iterator.
That means I can compute every element of this RDD lazily.
Best Regards,
Kevin.
I think it is already lazily computed, or do you mean something else? Following
is the signature of compute in RDD
def compute(split: Partition, context: TaskContext): Iterator[T]
Thanks.
Zhan Zhang
On Oct 28, 2014, at 8:15 PM, Dai, Kevin yun...@ebay.com wrote:
Hi, ALL
I have a RDD[T],
: user@spark.apache.org
Subject: Re: Use RDD like a Iterator
I think it is already lazily computed, or do you mean something else? Following
is the signature of compute in RDD
def compute(split: Partition, context: TaskContext): Iterator[T]
Thanks.
Zhan Zhang
On Oct 28, 2014, at 8:15 PM, Dai