[
https://issues.apache.org/jira/browse/FLINK-20496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
YufeiLiu updated FLINK-20496:
-----------------------------
Description:
When using RocksDBStateBackend and enabling
{{state.backend.rocksdb.memory.managed}} and
{{state.backend.rocksdb.memory.fixed-per-slot}}, flink will strictly limited
rocksdb memory usage which contains "write buffer" and "block cache". With
these options rocksdb stores index and filters in block cache, because in
default options index/filters can grows unlimited.
But it's lead another issue, if high-priority cache(configure by
{{state.backend.rocksdb.memory.high-prio-pool-ratio}}) can't fit all
index/filters blocks, it will load all metadata from disk when cache missed,
and program went extremely slow. According to [Partitioned Index
Filters|https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters][1],
we can enable two-level index having acceptable performance when index/filters
cache missed.
Enable these options can get over 10x faster in my case[2], I think we can
add an option {{state.backend.rocksdb.partitioned-index-filters}} and default
value is false, so we can use this feature easily.
[1] Partitioned Index Filters:
https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters
[2] Deduplicate scenario, state.backend.rocksdb.memory.fixed-per-slot=256M,
SSD, elapsed time 4.91ms -> 0.33ms.
was:
When using RocksDBStateBackend and enabling
{{state.backend.rocksdb.memory.managed}} and
{{state.backend.rocksdb.memory.fixed-per-slot}}, flink will strictly limited
rocksdb memory usage which contains "write buffer" and "block cache". With
these options rocksdb stores index and filters in block cache, because in
default options index/filters can grows unlimited.
But it's lead another issue, if high-priority cache(configure by
{{state.backend.rocksdb.memory.high-prio-pool-ratio}}) can't fit all
index/filters blocks, it will load all metadata from disk when cache missed,
and program went extremely slow. According to [Partitioned Index
Filters|https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters][1],
we can enable two-level index having acceptable performance when index/filters
cache missed.
Enable these options can get over 10x faster in my case[2], I think we can add
an option {{state.backend.rocksdb.partitioned-index-filters}} and default value
is false, so we can use this feature easily.
[1] Partitioned Index Filters:
https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters
[2] Deduplicate scenario, state.backend.rocksdb.memory.fixed-per-slot=256M,
SSD, elapsed time 4.91ms -> 0.33ms.
> RocksDB partitioned index filter option
> ---------------------------------------
>
> Key: FLINK-20496
> URL: https://issues.apache.org/jira/browse/FLINK-20496
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / State Backends
> Reporter: YufeiLiu
> Priority: Major
>
> When using RocksDBStateBackend and enabling
> {{state.backend.rocksdb.memory.managed}} and
> {{state.backend.rocksdb.memory.fixed-per-slot}}, flink will strictly limited
> rocksdb memory usage which contains "write buffer" and "block cache". With
> these options rocksdb stores index and filters in block cache, because in
> default options index/filters can grows unlimited.
> But it's lead another issue, if high-priority cache(configure by
> {{state.backend.rocksdb.memory.high-prio-pool-ratio}}) can't fit all
> index/filters blocks, it will load all metadata from disk when cache missed,
> and program went extremely slow. According to [Partitioned Index
> Filters|https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters][1],
> we can enable two-level index having acceptable performance when
> index/filters cache missed.
> Enable these options can get over 10x faster in my case[2], I think we can
> add an option {{state.backend.rocksdb.partitioned-index-filters}} and default
> value is false, so we can use this feature easily.
> [1] Partitioned Index Filters:
> https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters
> [2] Deduplicate scenario, state.backend.rocksdb.memory.fixed-per-slot=256M,
> SSD, elapsed time 4.91ms -> 0.33ms.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)