[
https://issues.apache.org/jira/browse/SPARK-53792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-53792:
-----------------------------------
Labels: pull-request-available (was: )
> Fix rocksdbPinnedBlocksMemoryUsage when bounded memory usage is enabled
> -----------------------------------------------------------------------
>
> Key: SPARK-53792
> URL: https://issues.apache.org/jira/browse/SPARK-53792
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 4.0.1
> Reporter: Zifei Feng
> Priority: Minor
> Labels: pull-request-available
>
> We forgot to fix this to show the correct metric when bounded memory usage is
> enabled. Currently, it is collecting data for each RocksDB without accounting
> for when all the RocksDB instances on the executor are sharing the same
> cache, leading to double counting.
> This is where we collect the cache memory metric:
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala#L2045]
> We are currently using getDBProperty("rocksdb.block-cache-pinned-usage") to
> read the pinned blocks for each DB. When the block cache is shared, this is
> wrong because:
> * {*}Instance-specific vs. global stats{*}: Database properties like
> "rocksdb.block-cache-pinned-usage" report on the memory size of entries
> requested specifically by that DB instance.
> * {*}Double-counting potential{*}: If you query the DB property on both
> instances and add them together, you could potentially double-count because a
> single block in the shared cache could be used by both DB instances.
> *Solution:* Do lrucache.getPinnedUsage
> ([https://github.com/facebook/rocksdb/blob/v9.8.4/java/src/main/java/org/rocksdb/Cache.java#L33C15-L33C29]
> in OSS RocksDB) instead, to get the actual memory size of pinned blocks in
> the shared cache. We are querying the cache here instead of the DB.
> To estimate the pinnedBlocks used by a single DB instance, we can divide
> lrucache.getPinnedUsage() by num_rocksdb_instances_sharing_the_cache. We
> already do a similar thing for memoryUsage. See:
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBMemoryManager.scala#L124]
> *Original fix:* OSS Spark PR:[
> https://github.com/apache/spark/commit/35c299a1e3e373e20ae45d7604df51c83ff1dbe2|https://github.com/apache/spark/commit/35c299a1e3e373e20ae45d7604df51c83ff1dbe2]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]