No, I'm not saying side effects change the count. But not executing the map() function at all certainly has an effect on the side effects of that function: the side effects which should take place never do. I am not sure that is something to be 'fixed'; it's a legitimate question.
You can persist an RDD if you do not want to compute it twice. On Sat, Mar 28, 2015 at 1:05 PM, jimfcarroll <jimfcarr...@gmail.com> wrote: > Hi Sean, > > Thanks for the response. > > I can't imagine a case (though my imagination may be somewhat limited) where > even map side effects could change the number of elements in the resulting > map. > > I guess "count" wouldn't officially be an 'action' if it were implemented > this way. At least it wouldn't ALWAYS be one. > > My example was contrived. We're passing RDDs to functions. If that RDD is an > instance of my class, then its count() may take a shortcut. If I > map/zip/zipWithIndex/mapPartition/etc. first then I'm stuck with a call that > literally takes 100s to 1000s of times longer (seconds vs hours on some of > our datasets) and since my custom RDDs are immutable they cache the count > call so a second invocation is the cost of a method call's overhead. > > I could fix this in Spark if there's any interest in that change. Otherwise > I'll need to overload more RDD methods for my own purposes (like all of the > transformations). Of course, that will be more difficult because those > intermediate classes (like MappedRDD) are private, so I can't extend them. > > Jim > > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-count-tp11298p11302.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org