[ 
https://issues.apache.org/jira/browse/SPARK-34198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276883#comment-17276883
 ] 

Jungtaek Lim commented on SPARK-34198:
--------------------------------------

The external modules means modules in external directory.

Personally I don't think there's huge difference between adding it in spark-sql 
core module vs adding it via external module. The major point of this is 
whether we want to add the functionality to Spark codebase or not. As we 
already confirmed there're concerns on adding this in Spark codebase, unless 
you raise the discussion in dev@ mailing list and gather consensus, the effort 
can be easily wasted. Please make sure we don't have such case.

And once we decide to add this, I'd rather say I'd like to see either we 
persuade repo owner to contribute well-known existing implementation 
(https://github.com/chermenin/spark-states) to ASF, or new PR based on #24922. 
I wouldn't like to review multiple PRs again and again for the same 
functionality.



> Add RocksDB StateStore as external module
> -----------------------------------------
>
>                 Key: SPARK-34198
>                 URL: https://issues.apache.org/jira/browse/SPARK-34198
>             Project: Spark
>          Issue Type: New Feature
>          Components: Structured Streaming
>    Affects Versions: 3.2.0
>            Reporter: L. C. Hsieh
>            Assignee: L. C. Hsieh
>            Priority: Major
>
> Currently Spark SS only has one built-in StateStore implementation 
> HDFSBackedStateStore. Actually it uses in-memory map to store state rows. As 
> there are more and more streaming applications, some of them requires to use 
> large state in stateful operations such as streaming aggregation and join.
> Several other major streaming frameworks already use RocksDB for state 
> management. So it is proven to be good choice for large state usage. But 
> Spark SS still lacks of a built-in state store for the requirement.
> We would like to explore the possibility to add RocksDB-based StateStore into 
> Spark SS. For the concern about adding RocksDB as a direct dependency, our 
> plan is to add this StateStore as an external module first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to