[
https://issues.apache.org/jira/browse/SPARK-34198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290771#comment-17290771
]
L. C. Hsieh commented on SPARK-34198:
-------------------------------------
FYI, I ran a benchmark against two open source implementations. The Chermenin's
implementation is consistently slower than Qubole's about 30%.
{code}
[info] Running benchmark: Put key value pairs
[info] Running case: Chermenin
[info] Stopped after 3 iterations, 103521 ms
[info] Running case: Qubole
[info] Stopped after 3 iterations, 76908 ms
[info] OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.7
[info] Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
[info] Put key value pairs: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info]
------------------------------------------------------------------------------------------------------------------------
[info] Chermenin 32725 34507
2030 0.0 49934.0 1.0X
[info] Qubole 25493 25636
186 0.0 38899.0 1.3X
{code}
> Add RocksDB StateStore as external module
> -----------------------------------------
>
> Key: SPARK-34198
> URL: https://issues.apache.org/jira/browse/SPARK-34198
> Project: Spark
> Issue Type: New Feature
> Components: Structured Streaming
> Affects Versions: 3.2.0
> Reporter: L. C. Hsieh
> Priority: Major
>
> Currently Spark SS only has one built-in StateStore implementation
> HDFSBackedStateStore. Actually it uses in-memory map to store state rows. As
> there are more and more streaming applications, some of them requires to use
> large state in stateful operations such as streaming aggregation and join.
> Several other major streaming frameworks already use RocksDB for state
> management. So it is proven to be good choice for large state usage. But
> Spark SS still lacks of a built-in state store for the requirement.
> We would like to explore the possibility to add RocksDB-based StateStore into
> Spark SS. For the concern about adding RocksDB as a direct dependency, our
> plan is to add this StateStore as an external module first.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]