[ 
https://issues.apache.org/jira/browse/SPARK-43421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721037#comment-17721037
 ] 

Chaoqin Li edited comment on SPARK-43421 at 5/9/23 6:19 PM:
------------------------------------------------------------

Design doc: 
https://docs.google.com/document/d/1c7EkHkguhE7WiIWAKMkvDi3kjd1V98oRtdjk6L_iHoA

 


was (Author: JIRAUSER295941):
[#Design 
doc]https://docs.google.com/document/d/1c7EkHkguhE7WiIWAKMkvDi3kjd1V98oRtdjk6L_iHoA

 

> Implement changelog checkpointing for RocksDB state store
> ---------------------------------------------------------
>
>                 Key: SPARK-43421
>                 URL: https://issues.apache.org/jira/browse/SPARK-43421
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.4.0
>            Reporter: Chaoqin Li
>            Priority: Major
>
> We have identified state checkpointing latency as one of the major 
> performance bottlenecks for stateful streaming queries. Currently, RocksDB 
> state store pauses the RocksDB instances to upload a snapshot to the cloud 
> when committing a batch, which is heavy weight and has unpredictable 
> performance.
> In order to reduce the checkpoint duration and end to end latency, we propose 
> to
> 1. During state commit, make the state of a microbatch durable by syncing the 
> changelog instead of the state snapshot to the checkpoint directory.
> 2. Upload snapshot in the background to enable changelog purging and faster 
> failure recovery.
> In this way, we allow the RocksDB instance to run uninterruptibly, which 
> improves RocksDB operation performance. This also dramatically reduces the 
> commit time and batch duration because we are uploading a smaller amount of 
> data during state commit. With this change, stateful query with RocksDB state 
> store will have lower and more predictable latency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to