[
https://issues.apache.org/jira/browse/SPARK-43421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721037#comment-17721037
]
Chaoqin Li edited comment on SPARK-43421 at 5/9/23 6:19 PM:
------------------------------------------------------------
Design doc:
https://docs.google.com/document/d/1c7EkHkguhE7WiIWAKMkvDi3kjd1V98oRtdjk6L_iHoA
was (Author: JIRAUSER295941):
[#Design
doc]https://docs.google.com/document/d/1c7EkHkguhE7WiIWAKMkvDi3kjd1V98oRtdjk6L_iHoA
> Implement changelog checkpointing for RocksDB state store
> ---------------------------------------------------------
>
> Key: SPARK-43421
> URL: https://issues.apache.org/jira/browse/SPARK-43421
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Affects Versions: 3.4.0
> Reporter: Chaoqin Li
> Priority: Major
>
> We have identified state checkpointing latency as one of the major
> performance bottlenecks for stateful streaming queries. Currently, RocksDB
> state store pauses the RocksDB instances to upload a snapshot to the cloud
> when committing a batch, which is heavy weight and has unpredictable
> performance.
> In order to reduce the checkpoint duration and end to end latency, we propose
> to
> 1. During state commit, make the state of a microbatch durable by syncing the
> changelog instead of the state snapshot to the checkpoint directory.
> 2. Upload snapshot in the background to enable changelog purging and faster
> failure recovery.
> In this way, we allow the RocksDB instance to run uninterruptibly, which
> improves RocksDB operation performance. This also dramatically reduces the
> commit time and batch duration because we are uploading a smaller amount of
> data during state commit. With this change, stateful query with RocksDB state
> store will have lower and more predictable latency.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]