Jungtaek Lim created SPARK-50007:
------------------------------------

             Summary: metrics defined by observe API are lost when the physical 
node is pruned out
                 Key: SPARK-50007
                 URL: https://issues.apache.org/jira/browse/SPARK-50007
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 3.5.3, 3.4.3, 4.0.0
            Reporter: Jungtaek Lim


When user defines the metrics via observe API, they expect the metrics to be 
retrieved via Observation (batch query) or update event of 
StreamingQueryListener.

But when the node (CollectMetrics) is lost in any reason (e.g. subtree is 
pruned by PruneFilters), Spark does behave like the metrics were not defined, 
instead of providing default values.

Spark should give a best effort to provide default values - when the node is 
pruned out from optimizer, it is mostly logically equivalent that there were no 
input being processed with the node (except the bug in analyzer/optimizer/etc 
which drop the node incorrectly), hence it's valid to just have default value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to