[
https://issues.apache.org/jira/browse/SPARK-51097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zeyu Chen updated SPARK-51097:
------------------------------
Description:
We currently lack detailed visibility into state store level state maintenance
in RocksDB. This limitation affects the ability to identify performance
degradation issues behind maintenance tasks.
To remediate this, we will introduce state store "instance" metrics to
StreamingQueryProgress to track the latest snapshot version uploaded in RocksDB.
This improvement addresses three challenges in observability:
* Uneven partition starvation, where we need to identify partitions with slow
state maintenance,
* Finding missing snapshots across versions, so we minimize extensive replays
during recovery,
* Identify performance instability, such as gaining insights into snapshot
upload patterns
was:
We currently lack detailed visibility into partition-level state maintenance in
RocksDB. This limitation affects the ability to identify performance
degradation issues behind maintenance tasks.
To remediate this, we will add the partition-level metrics to
StreamingQueryProgress to track the latest snapshot version uploaded in RocksDB.
This improvement addresses three challenges in observability:
* Uneven partition starvation, where we need to identify partitions with slow
state maintenance,
* Finding missing snapshots across versions, so we minimize extensive replays
during recovery,
* Identify performance instability, such as gaining insights into snapshot
upload patterns
> Adding state store level metrics for last uploaded snapshot version in RocksDB
> ------------------------------------------------------------------------------
>
> Key: SPARK-51097
> URL: https://issues.apache.org/jira/browse/SPARK-51097
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Affects Versions: 4.0.0, 4.1.0
> Reporter: Zeyu Chen
> Priority: Minor
> Labels: pull-request-available
>
> We currently lack detailed visibility into state store level state
> maintenance in RocksDB. This limitation affects the ability to identify
> performance degradation issues behind maintenance tasks.
> To remediate this, we will introduce state store "instance" metrics to
> StreamingQueryProgress to track the latest snapshot version uploaded in
> RocksDB.
> This improvement addresses three challenges in observability:
> * Uneven partition starvation, where we need to identify partitions with
> slow state maintenance,
> * Finding missing snapshots across versions, so we minimize extensive
> replays during recovery,
> * Identify performance instability, such as gaining insights into snapshot
> upload patterns
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]