[
https://issues.apache.org/jira/browse/SPARK-28120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-28120:
----------------------------------
Affects Version/s: (was: 2.4.3)
3.0.0
> RocksDB state storage
> ---------------------
>
> Key: SPARK-28120
> URL: https://issues.apache.org/jira/browse/SPARK-28120
> Project: Spark
> Issue Type: New Feature
> Components: Structured Streaming
> Affects Versions: 3.0.0
> Reporter: Vikram Agrawal
> Priority: Major
>
> SPARK-13809 introduced a framework for state management for computing
> Streaming Aggregates. The default implementation was in-memory hashmap which
> was backed up in HDFS complaint file system at the end of every micro-batch.
> Current implementation suffers from Performance and Latency Issues. It uses
> Executor JVM memory to store the states. State store size is limited by the
> size of the executor memory. Also
> Executor JVM memory is shared by state storage and other tasks operations.
> State storage size will impact the performance of task execution
> Moreover, GC pauses, executor failures, OOM issues are common when the size
> of state storage increases which increases overall latency of a micro-batch
> RocksDb is an embedded DB which can provide major performance improvements.
> Other major streaming frameworks have rocksdb as default state storage.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]