Davin Tjong created SPARK-46294:
-----------------------------------
Summary: Clean up initValue vs zeroValue semantics in SQLMetrics
Key: SPARK-46294
URL: https://issues.apache.org/jira/browse/SPARK-46294
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 3.5.0
Reporter: Davin Tjong
The semantics of initValue and _zeroValue in SQLMetrics is a little bit
confusing, since they effectively mean the same thing. Changing it to the
following would be clearer, especially in terms of defining what an "invalid"
metric is.
proposed definitions:
initValue is the starting value for a SQLMetric. If a metric has value equal to
its initValue, then it should be filtered out before aggregating with
SQLMetrics.stringValue().
zeroValue defines the lowest value considered valid. If a SQLMetric is invalid,
it is set to zeroValue upon receiving any updates, and it also reports
zeroValue as its value to avoid exposing it to the user programatically
(concern previouosly addressed in SPARK-41442).
For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that
the metric is by default invalid. At the end of a task, we will update the
metric making it valid, and the invalid metrics will be filtered out when
calculating min, max, etc. as a workaround for SPARK-11013.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]