[ https://issues.apache.org/jira/browse/CRUNCH-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731249#comment-14731249 ]
Surbhi Mungre commented on CRUNCH-558: -------------------------------------- You should consider adding a configuration to opt in and out of displaying counters on Spark UI. I might be wrong but going by [1] it feels like displaying accumulators can effect performance of an application. I am not sure what happens when size of the Map which stores counters become too large. I don't know if will we really see any noticeable difference or not. {quote} In addition, just adding a name to your accumulator also has the side-effect that the UI will call .toString on the accumulator update from each task. So if you did use an accumulator on a more complex type with an expensive .toString, just giving the accumulator a name could destroy performance. We’re left with the strange advice to users: if you are just using a counter, make sure you add a name to your counter; but if it’s something more complicated than a counter, be sure you do not add a name. {quote} [1] http://imranrashid.com/posts/Spark-Accumulators/ > Add name to Spark Accumulators > ------------------------------ > > Key: CRUNCH-558 > URL: https://issues.apache.org/jira/browse/CRUNCH-558 > Project: Crunch > Issue Type: Improvement > Components: Spark > Reporter: Micah Whitacre > Assignee: Micah Whitacre > Fix For: 0.14.0 > > Attachments: CRUNCH-558.patch > > > It was brought up on the mailing list that our Crunch counters are not > showing up on the Spark webui possibly because they are not named. > {quote} > We are currently testing a few capabilities using Spark and one thing we > noticed in Spark is they don't list any user defined accumulators on web UI. > On MapReduce I would imagine counters being displayed on the job page, > however on a SparkPipeline I was only able to pull counter information from > PipelineResult#getStageResult(). > I think the reason these accumulators are not visible on web UI is because > crunch does not name these accumulators. Spark expects an accumulator to have > a name to be visible on the UI. > https://github.com/apache/crunch/blob/apache-crunch-0.13.0/crunch-spark/src/main/java/org/apache/crunch/impl/spark/SparkRuntime.java#L125-L126 > https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala#L616-L624 > (accumulator API with Name) > I would like to know if it's possible in crunch to name these accumulators so > they are available in web UI. This will give us an experience where users can > monitor/watch accumulators from web UI to obtain key information about their > jobs. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)