Zeyu Chen created SPARK-51252:
---------------------------------
Summary: Adding state store level metrics for last uploaded
snapshot version in HDFS State Stores
Key: SPARK-51252
URL: https://issues.apache.org/jira/browse/SPARK-51252
Project: Spark
Issue Type: Improvement
Components: Structured Streaming
Affects Versions: 4.0.0, 4.1.0
Reporter: Zeyu Chen
Similarly to SPARK-51097, we would also like to introduce a similar level of
observability to HDFSBackedStateStore.
The introduction of state store "instance" metrics to StreamingQueryProgress to
track the latest snapshot version uploaded in HDFS state stores should address
three challenges in observability:
* Uneven partition starvation, where we need to identify partitions with slow
state maintenance,
* Finding missing snapshots across versions, so we minimize extensive replays
during recovery,
* Identify performance instability, such as gaining insights into snapshot
upload patterns
The instance metrics should be kept as generalized as possible, so that future
instance metrics for observability can be added with minimal refactoring.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]