[
https://issues.apache.org/jira/browse/SPARK-30666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-30666:
----------------------------------
Affects Version/s: (was: 3.0.0)
3.1.0
> Reliable single-stage accumulators
> ----------------------------------
>
> Key: SPARK-30666
> URL: https://issues.apache.org/jira/browse/SPARK-30666
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 3.1.0
> Reporter: Enrico Minack
> Priority: Major
>
> This proposes a pragmatic improvement to allow for reliable single-stage
> accumulators. Under the assumption that a given stage / partition / rdd
> produces identical results, non-deterministic code incrementing accumulators
> also produces identical accumulator increments on success. Rerunning
> partitions for any reason should always produce the same increments on
> success.
> With this pragmatic approach, increments from individual partitions / tasks
> are compared to earlier increments. Depending on the strategy of how a new
> increment updates over an earlier increment from the same partition,
> different semantics of accumulators (here called accumulator modes) can be
> implemented:
> - ALL sums over all increments of each partition: this represents the
> current implementation of accumulators
> - MAX over all increments of each partition: assuming accumulators only
> increment while a partition is processed, a successful task provides an
> accumulator value that is always larger than any value of failed tasks, hence
> it paramounts any failed task's value. This produces reliable accumulator
> values. This should only be used in a single stage.
> - LAST increment: allows to retrieve the latest increment for each partition
> only.
> The implementation for MAX and LAST requires extra memory that scales with
> the number of partitions. The current ALL implementation does not require
> extra memory.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]