Hi Sean, Thanks for the response.
I can't imagine a case (though my imagination may be somewhat limited) where even map side effects could change the number of elements in the resulting map. I guess "count" wouldn't officially be an 'action' if it were implemented this way. At least it wouldn't ALWAYS be one. My example was contrived. We're passing RDDs to functions. If that RDD is an instance of my class, then its count() may take a shortcut. If I map/zip/zipWithIndex/mapPartition/etc. first then I'm stuck with a call that literally takes 100s to 1000s of times longer (seconds vs hours on some of our datasets) and since my custom RDDs are immutable they cache the count call so a second invocation is the cost of a method call's overhead. I could fix this in Spark if there's any interest in that change. Otherwise I'll need to overload more RDD methods for my own purposes (like all of the transformations). Of course, that will be more difficult because those intermediate classes (like MappedRDD) are private, so I can't extend them. Jim -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-count-tp11298p11302.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org