I was looking at some of the Partition implementations in core/rdd and getOrCompute(...) in CacheManager. It appears that getOrCompute(...) returns an InterruptibleIterator, which delegates to a wrapped Iterator. That would imply that Partitions should extend Iterator, but that is not always the case. For example, these Partitions for these RDDs do not extend Iterator:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PartitionwiseSampledRDD.scala https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala Why is that? Shouldn't all Partitions be Iterators? Clearly I'm missing something. On a related subject, I was thinking of documenting the data flow of RDDs in more detail. The code is not hard to follow, but it's nice to have a simple picture with the major components and some explanation of the flow. The declaration of Partition is throwing me off. Thanks! ----- -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-data-flow-tp9804.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org