Zeyu Chen created SPARK-51358:
---------------------------------
Summary: Introduce snapshot upload lag detection through
StateStoreCoordinator
Key: SPARK-51358
URL: https://issues.apache.org/jira/browse/SPARK-51358
Project: Spark
Issue Type: Improvement
Components: Structured Streaming
Affects Versions: 4.0.0, 4.1
Reporter: Zeyu Chen
As part of the first step to increase visibility into snapshot upload lag, we
want to add a snapshot lag alerting system. Using the state store coordinator,
we want to publish driver logs to warn about specific state store instances
falling behind.
This allows us to enable observability through dashboards and alerts, helping
us understand the patterns and frequency of lag in production. The collected
data will also inform future remediation strategies.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]