[
https://issues.apache.org/jira/browse/SPARK-46294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Davin Tjong updated SPARK-46294:
--------------------------------
Component/s: SQL
(was: Spark Core)
> Clean up initValue vs zeroValue semantics in SQLMetrics
> -------------------------------------------------------
>
> Key: SPARK-46294
> URL: https://issues.apache.org/jira/browse/SPARK-46294
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.5.0
> Reporter: Davin Tjong
> Priority: Minor
>
> The semantics of initValue and _zeroValue in SQLMetrics is a little bit
> confusing, since they effectively mean the same thing. Changing it to the
> following would be clearer, especially in terms of defining what an "invalid"
> metric is.
>
> proposed definitions:
>
> initValue is the starting value for a SQLMetric. If a metric has value equal
> to its initValue, then it should be filtered out before aggregating with
> SQLMetrics.stringValue().
>
> zeroValue defines the lowest value considered valid. If a SQLMetric is
> invalid, it is set to zeroValue upon receiving any updates, and it also
> reports zeroValue as its value to avoid exposing it to the user
> programatically (concern previouosly addressed in SPARK-41442).
> For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that
> the metric is by default invalid. At the end of a task, we will update the
> metric making it valid, and the invalid metrics will be filtered out when
> calculating min, max, etc. as a workaround for SPARK-11013.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]