[ 
https://issues.apache.org/jira/browse/SPARK-34198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanjian Li updated SPARK-34198:
--------------------------------
    Description: 
Currently Spark SS only has one built-in StateStore implementation 
HDFSBackedStateStore. Actually it uses in-memory map to store state rows. As 
there are more and more streaming applications, some of them requires to use 
large state in stateful operations such as streaming aggregation and join.

Several other major streaming frameworks already use RocksDB for state 
management. So it is proven to be good choice for large state usage. But Spark 
SS still lacks of a built-in state store for the requirement.

We would like to explore the possibility to add RocksDB-based StateStore into 
Spark SS.

 

  was:
Currently Spark SS only has one built-in StateStore implementation 
HDFSBackedStateStore. Actually it uses in-memory map to store state rows. As 
there are more and more streaming applications, some of them requires to use 
large state in stateful operations such as streaming aggregation and join.

Several other major streaming frameworks already use RocksDB for state 
management. So it is proven to be good choice for large state usage. But Spark 
SS still lacks of a built-in state store for the requirement.

We would like to explore the possibility to add RocksDB-based StateStore into 
Spark SS. For the concern about adding RocksDB as a direct dependency, our plan 
is to add this StateStore as an external module first.


> Add RocksDB StateStore as external module
> -----------------------------------------
>
>                 Key: SPARK-34198
>                 URL: https://issues.apache.org/jira/browse/SPARK-34198
>             Project: Spark
>          Issue Type: New Feature
>          Components: Structured Streaming
>    Affects Versions: 3.2.0
>            Reporter: L. C. Hsieh
>            Priority: Major
>
> Currently Spark SS only has one built-in StateStore implementation 
> HDFSBackedStateStore. Actually it uses in-memory map to store state rows. As 
> there are more and more streaming applications, some of them requires to use 
> large state in stateful operations such as streaming aggregation and join.
> Several other major streaming frameworks already use RocksDB for state 
> management. So it is proven to be good choice for large state usage. But 
> Spark SS still lacks of a built-in state store for the requirement.
> We would like to explore the possibility to add RocksDB-based StateStore into 
> Spark SS.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to