Jungtaek Lim created SPARK-37224:
------------------------------------
Summary: Optimize write path on RocksDB state store provider
Key: SPARK-37224
URL: https://issues.apache.org/jira/browse/SPARK-37224
Project: Spark
Issue Type: Improvement
Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Jungtaek Lim
We figured out that RocksDB class does additional lookup on the key for write
operations (put/delete) to track the number of rows. This is required to
fulfill the metric of the state store, but after benchmarking it turns out
performance hit is significant.
We can't find a good alternative to retain the number of rows without
additional lookup, so we are proposing a new config to flag tracking the number
of rows.
* *config name: spark.sql.streaming.stateStore.rocksdb.trackTotalNumberOfRows*
* *default value: true* (since we already serve the number and we want to
avoid breaking change)
*We will give "0" for the number of keys in the state store metric when the
config is turned off.* The ideal value seems to be a negative one, but
currently SQL metric doesn't allow negative value and there seems to be some
technical/historical issue not to.
*We will also handle the case the config is flipped during restart* - this will
enable the way end users enjoy the benefit but also not lose the chance to know
the number of state rows. End users can turn off the flag to maximize the
performance, and turn on the flag (restart required) when they want to see the
actual number of keys (for observability/debugging/etc).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]