[
https://issues.apache.org/jira/browse/FLINK-32643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746180#comment-17746180
]
Hangxiang Yu commented on FLINK-32643:
--------------------------------------
Hi, Thanks for the proposal.
I have some questions about this, PTAL:
{quote}we cannot set it too large, such as 512M, this may cause OOM
{quote}
We also have a large block cache size in the production env, it works well in
most cases.
You mean that lacking strict capacity limit for memory usage for RocksDB [1]
may cause OOM ?
{quote}and each DB cannot effectively utilize memory
{quote}
Memory sharing between RocksDB instances has been implemented [2], Could this
help to resolve ?
{quote}introduce off-heap shared state cache across multiple db instances for
stateful operators in TM.
h4.
{quote}
What's the cache type ? read-write cache or only read cache ? And What's the
data structure ?
What's the cache strategy and granularity ? Caching in the record Level my
increase the overhead per record.
Could this also increase the space overhead compared to Block Cache (due to
compression)?
Maybe I missed something about details. I'm also interested in this so wanting
to learn more.
[1] https://issues.apache.org/jira/browse/FLINK-15532
[2] https://issues.apache.org/jira/browse/FLINK-29928
> Introduce off-heap shared state cache across stateful operators in TM
> ---------------------------------------------------------------------
>
> Key: FLINK-32643
> URL: https://issues.apache.org/jira/browse/FLINK-32643
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / State Backends
> Affects Versions: 1.19.0
> Reporter: Fang Yong
> Priority: Major
>
> Currently each stateful operator will create an independent db instance if it
> uses rocksdb as state backend, and we can configure
> `state.backend.rocksdb.block.cache-size` for each db to speed up state
> performance. This parameter defaults to 8M, and we cannot set it too large,
> such as 512M, this may cause OOM and each DB cannot effectively utilize
> memory. To address this issue, we would like to introduce off-heap shared
> state cache across multiple db instances for stateful operators in TM.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)