[ 
https://issues.apache.org/jira/browse/SPARK-34198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290771#comment-17290771
 ] 

L. C. Hsieh commented on SPARK-34198:
-------------------------------------

FYI, I ran a benchmark against two open source implementations. The Chermenin's 
implementation is consistently slower than Qubole's about 30%.

{code}
[info] Running benchmark: Put key value pairs
[info]   Running case: Chermenin
[info]   Stopped after 3 iterations, 103521 ms
[info]   Running case: Qubole
[info]   Stopped after 3 iterations, 76908 ms
[info] OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.7
[info] Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
[info] Put key value pairs:                      Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] 
------------------------------------------------------------------------------------------------------------------------
[info] Chermenin                                         32725          34507   
     2030          0.0       49934.0       1.0X
[info] Qubole                                            25493          25636   
      186          0.0       38899.0       1.3X
{code}

> Add RocksDB StateStore as external module
> -----------------------------------------
>
>                 Key: SPARK-34198
>                 URL: https://issues.apache.org/jira/browse/SPARK-34198
>             Project: Spark
>          Issue Type: New Feature
>          Components: Structured Streaming
>    Affects Versions: 3.2.0
>            Reporter: L. C. Hsieh
>            Priority: Major
>
> Currently Spark SS only has one built-in StateStore implementation 
> HDFSBackedStateStore. Actually it uses in-memory map to store state rows. As 
> there are more and more streaming applications, some of them requires to use 
> large state in stateful operations such as streaming aggregation and join.
> Several other major streaming frameworks already use RocksDB for state 
> management. So it is proven to be good choice for large state usage. But 
> Spark SS still lacks of a built-in state store for the requirement.
> We would like to explore the possibility to add RocksDB-based StateStore into 
> Spark SS. For the concern about adding RocksDB as a direct dependency, our 
> plan is to add this StateStore as an external module first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to