Jungtaek Lim created SPARK-50007:
------------------------------------
Summary: metrics defined by observe API are lost when the physical
node is pruned out
Key: SPARK-50007
URL: https://issues.apache.org/jira/browse/SPARK-50007
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 3.5.3, 3.4.3, 4.0.0
Reporter: Jungtaek Lim
When user defines the metrics via observe API, they expect the metrics to be
retrieved via Observation (batch query) or update event of
StreamingQueryListener.
But when the node (CollectMetrics) is lost in any reason (e.g. subtree is
pruned by PruneFilters), Spark does behave like the metrics were not defined,
instead of providing default values.
Spark should give a best effort to provide default values - when the node is
pruned out from optimizer, it is mostly logically equivalent that there were no
input being processed with the node (except the bug in analyzer/optimizer/etc
which drop the node incorrectly), hence it's valid to just have default value.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]