Neil Ferguson created SPARK-3051:
------------------------------------
Summary: Support looking-up named accumulators in a registry
Key: SPARK-3051
URL: https://issues.apache.org/jira/browse/SPARK-3051
Project: Spark
Issue Type: Improvement
Components: Spark Core
Reporter: Neil Ferguson
This is a proposed enhancement to Spark based on the following mailing list
discussion:
http://apache-spark-developers-list.1001551.n3.nabble.com/quot-Dynamic-variables-quot-in-Spark-td7450.html.
This proposal builds on SPARK-2380 (Support displaying accumulator values in
the web UI) to allow named accumulables to be looked-up in a "registry", as
opposed to having to be passed to every method that need to access them.
The use case was described well by [~shivaram], as follows:
Lets say you have two functions you use
in a map call and want to measure how much time each of them takes. For
example, if you have a code block like the one below and you want to
measure how much time f1 takes as a fraction of the task.
{noformat}
a.map { l =>
val f = f1(l)
... some work here ...
}
{noformat}
It would be really cool if we could do something like
{noformat}
a.map { l =>
val start = System.nanoTime
val f = f1(l)
TaskMetrics.get("f1-time").add(System.nanoTime - start)
}
{noformat}
SPARK-2380 provides a partial solution to this problem -- however the
accumulables would still need to be passed to every function that needs them,
which I think would be cumbersome in any application of reasonable complexity.
The proposal, as suggested by [~pwendell], is to have a "registry" of
accumulables, that can be looked-up by name.
Regarding the implementation details, I'd propose that we broadcast a
serialized version of all named accumulables in the DAGScheduler (similar to
what SPARK-2521 does for Tasks). These can then be deserialized in the
Executor.
Accumulables are already stored in thread-local variables in the Accumulators
object, so exposing these in the registry should be simply a matter of wrapping
this object, and keying the accumulables by name (they are currently keyed by
ID).
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]