[
https://issues.apache.org/jira/browse/SPARK-37224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jungtaek Lim resolved SPARK-37224.
----------------------------------
Fix Version/s: 3.3.0
Resolution: Fixed
Issue resolved by pull request 34502
[https://github.com/apache/spark/pull/34502]
> Optimize write path on RocksDB state store provider
> ---------------------------------------------------
>
> Key: SPARK-37224
> URL: https://issues.apache.org/jira/browse/SPARK-37224
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Affects Versions: 3.3.0
> Reporter: Jungtaek Lim
> Assignee: Jungtaek Lim
> Priority: Major
> Fix For: 3.3.0
>
>
> We figured out that RocksDB class does additional lookup on the key for write
> operations (put/delete) to track the number of rows. This is required to
> fulfill the metric of the state store, but after benchmarking it turns out
> performance hit is significant.
> We can't find a good alternative to retain the number of rows without
> additional lookup, so we are proposing a new config to flag tracking the
> number of rows.
> * *config name:
> spark.sql.streaming.stateStore.rocksdb.trackTotalNumberOfRows*
> * *default value: true* (since we already serve the number and we want to
> avoid breaking change)
> *We will give "0" for the number of keys in the state store metric when the
> config is turned off.* The ideal value seems to be a negative one, but
> currently SQL metric doesn't allow negative value and there seems to be some
> technical/historical issue not to.
> *We will also handle the case the config is flipped during restart* - this
> will enable the way end users enjoy the benefit but also not lose the chance
> to know the number of state rows. End users can turn off the flag to maximize
> the performance, and turn on the flag (restart required) when they want to
> see the actual number of keys (for observability/debugging/etc).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]