Re: Use RDD like a Iterator

2014-10-30 Thread Zhan Zhang
RDD.toLocalIterator return the partition one by one but with all elements in the partition, which is not lazy calculated. Given the design of spark, it is very hard to maintain the state of iterator across runJob. def toLocalIterator: Iterator[T] = { def collectPartition(p: Int): Array[T]

Re: Use RDD like a Iterator

2014-10-29 Thread Sean Owen
Call RDD.toLocalIterator()? https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html On Wed, Oct 29, 2014 at 4:15 AM, Dai, Kevin yun...@ebay.com wrote: Hi, ALL I have a RDD[T], can I use it like a iterator. That means I can compute every element of this RDD lazily.

Re: Use RDD like a Iterator

2014-10-29 Thread Yanbo Liang
RDD.toLocalIterator() is the suitable solution. But I doubt whether it conform with the design principle of spark and RDD. All RDD transform is lazily computed until it end with some actions. 2014-10-29 15:28 GMT+08:00 Sean Owen so...@cloudera.com: Call RDD.toLocalIterator()?

Use RDD like a Iterator

2014-10-28 Thread Dai, Kevin
Hi, ALL I have a RDD[T], can I use it like a iterator. That means I can compute every element of this RDD lazily. Best Regards, Kevin.

Re: Use RDD like a Iterator

2014-10-28 Thread Zhan Zhang
I think it is already lazily computed, or do you mean something else? Following is the signature of compute in RDD def compute(split: Partition, context: TaskContext): Iterator[T] Thanks. Zhan Zhang On Oct 28, 2014, at 8:15 PM, Dai, Kevin yun...@ebay.com wrote: Hi, ALL I have a RDD[T],

RE: Use RDD like a Iterator

2014-10-28 Thread Ganelin, Ilya
: user@spark.apache.org Subject: Re: Use RDD like a Iterator I think it is already lazily computed, or do you mean something else? Following is the signature of compute in RDD def compute(split: Partition, context: TaskContext): Iterator[T] Thanks. Zhan Zhang On Oct 28, 2014, at 8:15 PM, Dai