[
https://issues.apache.org/jira/browse/SPARK-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186379#comment-15186379
]
Liwei Lin commented on SPARK-10620:
-----------------------------------
hi [~andrewor14], in the "\[3\] A Simpler Accumulator API" section of the
design doc:
{quote}
Since the design of this is mostly orthogonal to the rest of this document,
here we only outline
the desire for a new, simpler API, and does not discuss the solution. The
actual design will be in
a separate design doc.
{quote}
Anywhere to find that separate "Simpler Accumulator API" design doc please?
Thanks!
> Look into whether accumulator mechanism can replace TaskMetrics
> ---------------------------------------------------------------
>
> Key: SPARK-10620
> URL: https://issues.apache.org/jira/browse/SPARK-10620
> Project: Spark
> Issue Type: Task
> Components: Spark Core
> Reporter: Patrick Wendell
> Assignee: Andrew Or
> Fix For: 2.0.0
>
> Attachments: accums-and-task-metrics.pdf
>
>
> This task is simply to explore whether the internal representation used by
> TaskMetrics could be performed by using accumulators rather than having two
> separate mechanisms. Note that we need to continue to preserve the existing
> "Task Metric" data structures that are exposed to users through event logs
> etc. The question is can we use a single internal codepath and perhaps make
> this easier to extend in the future.
> I think a full exploration would answer the following questions:
> - How do the semantics of accumulators on stage retries differ from aggregate
> TaskMetrics for a stage? Could we implement clearer retry semantics for
> internal accumulators to allow them to be the same - for instance, zeroing
> accumulator values if a stage is retried (see discussion here: SPARK-10042).
> - Are there metrics that do not fit well into the accumulator model, or would
> be difficult to update as an accumulator.
> - If we expose metrics through accumulators in the future rather than
> continuing to add fields to TaskMetrics, what is the best way to coerce
> compatibility?
> - Are there any other considerations?
> - Is it worth it to do this, or is the consolidation too complicated to
> justify?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]