bjkonglu created SPARK-26135:
--------------------------------
Summary: Structured Streaming reporting metrics programmatically
using asynchronous APIs can't get all queries metrics
Key: SPARK-26135
URL: https://issues.apache.org/jira/browse/SPARK-26135
Project: Spark
Issue Type: Improvement
Components: Structured Streaming
Affects Versions: 2.3.1
Environment: h3.
Reporter: bjkonglu
h3. Background
When I use Structured Streaming handle real-time data, I also want to know the
streaming application metrics, for example
prcessedRowsPerSecond、inputRowsPerSeconds etc. So I report metrics
programmatically using asynchronous APIs.
{code:java}
val spark: SparkSession = ...
spark.streams.addListener(new StreamingQueryListener() {
override def onQueryStarted(queryStarted: QueryStartedEvent): Unit = {
println("Query started: " + queryStarted.id)
}
override def onQueryTerminated(queryTerminated: QueryTerminatedEvent): Unit
= {
println("Query terminated: " + queryTerminated.id)
}
override def onQueryProgress(queryProgress: QueryProgressEvent): Unit = {
println("Query made progress: " + queryProgress.progress)
}
})
{code}
h3. Questions
When the streaming application has a single query, asynchronous APIs work
well. But when the streaming application has many queries, asynchronous APIs
can't report metrics exactly, some queries can report well, some queries report
delay and metrics number lower.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]