[
https://issues.apache.org/jira/browse/SPARK-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin resolved SPARK-603.
-------------------------------
Resolution: Won't Fix
Closing this one as part of [~aash]'s cleanup. I think this problem is being
fixed as we add accumulator / metrics values to the web ui.
> add simple Counter API
> ----------------------
>
> Key: SPARK-603
> URL: https://issues.apache.org/jira/browse/SPARK-603
> Project: Spark
> Issue Type: New Feature
> Priority: Minor
>
> Users need a very simple way to create counters in their jobs. Accumulators
> provide a way to do this, but are a little clunky, for two reasons:
> 1) the setup is a nuisance
> 2) w/ delayed evaluation, you don't know when it will actually run, so its
> hard to look at the values
> consider this code:
> {code}
> def filterBogus(rdd:RDD[MyCustomClass], sc: SparkContext) = {
> val filterCount = sc.accumulator(0)
> val filtered = rdd.filter{r =>
> if (isOK(r)) true else {filterCount += 1; false}
> }
> println("removed " + filterCount.value + " records)
> filtered
> }
> {code}
> The println will always say 0 records were filtered, because its printed
> before anything has actually run. I could print out the value later on, but
> note that it would destroy the modularity of the method -- kinda ugly to
> return the accumulator just so that it can get printed later on. (and of
> course, the caller in turn might not know when the filter is going to get
> applied, and would have to pass the accumulator up even further ...)
> I'd like to have Counters which just automatically get printed out whenever a
> stage has been run, and also with some api to get them back. I realize this
> is tricky b/c a stage can get re-computed, so maybe you should only increment
> the counters once.
> Maybe a more general way to do this is to provide some callback for whenever
> an RDD is computed -- by default, you would just print the counters, but the
> user could replace w/ a custom handler.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]