Michael Armbrust created SPARK-18516:
----------------------------------------

             Summary: Separate instantaneous state from progress performance 
statistics
                 Key: SPARK-18516
                 URL: https://issues.apache.org/jira/browse/SPARK-18516
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
            Reporter: Michael Armbrust
            Assignee: Michael Armbrust
            Priority: Blocker


There are two types of information that you want to be able to extract from a 
running query: instantaneous _status_ and metrics about the performance as make 
_progress_ in query processing.

Today, these are conflated in a single {{StreamingQueryStatus}} object.  The 
downside to this approach is that a user now needs to reason about what state 
the query is in anytime they retrieve a status object.  Fields like 
{{statusMessage}} don't appear in messages that come from listener bus.  And 
inputRate/processingRate statistics are usually {{0}} when you retrieve a 
status object from the query itself.

I propose we make the follow changes:
 - Make {{status}} only report instantaneous things, such as if data is 
available or a human readable message about what phase we are currently in.
 - Have a separate {{progress}} message that we report for each trigger with 
the other performance information that lives in status today.  You should be 
able to easily retrieve a configurable number of the most recent progress 
messages instead of just the most recent.

While we are making these changes, I propose that we also change {{id}} to be a 
globally unique identifier, rather than a JVM unique one.  Without this its 
hard to correlate performance across restarts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to