[
https://issues.apache.org/jira/browse/FLINK-30475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653585#comment-17653585
]
David Hrbacek commented on FLINK-30475:
---------------------------------------
Hello [~Yanfei Lei] [~masteryhx] ,
Sorry for the late response.
I do not expect, that performance results with {{deleteRange()}} changed since
FLINK-28010.
But if I describe our use-case scenario:
* Write extensive keyed map-state,
* where the state basically represents a queue of elements waiting for
processing.
* This state is periodically (e.g.: after 3 days) cleared (with millions of
items).
When {{MapState#clear()}} with iterator is called, then clearing of the state
lasts for hours(!) with occasional checkpoint failing and big degradation of
pipeline performance.
So little performance drop mentioned in
[blog|https://rocksdb.org/blog/2018/11/21/delete-range.html] is not really a
problem in comparison to {{clear()}} with scan and delete.
On the other hand, I admit that for most use-cases performance drop can be
problematic.
As I understood the problem, then:
* {{deleteRange()}} is better for clearing map states with big amount of keys,
* whereas clear via scan and delete is optimal for map states with a low amount
of keys.
This leads me to the solution proposal, where the {{RocksDBMapState#clear()}}
method implementation can be chosen via configuration.
I can imagine two ways how to implement configurable clear operation:
* Switchable: e.g.: config {{state.backend.rocksdb.map-state.clear-op}} with
default option {{scan-and-delete}} and optional {{delete-range}}.
* With threshold: e.g.: config
{{state.backend.rocksdb.map-state.clear-with-delete-range-threshold}} where
user can configure threshold of map state size from which will be used
{{delete-range} }. And default value will be {{-1}} which means never use
{{delete-range}} .
What do you think?
> Improved speed of RocksDBMapState clear() using rocksDB.deleteRange
> -------------------------------------------------------------------
>
> Key: FLINK-30475
> URL: https://issues.apache.org/jira/browse/FLINK-30475
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / State Backends
> Affects Versions: 1.16.0
> Reporter: David Hrbacek
> Priority: Major
> Labels: pull-request-available
>
> Currently {{RocksDBMapState#clear()}} is processed via keyRange traversing
> and inserting particular keys into BatchWrite for deletion.
> RocksDb offer much faster way how to delete key range - {{deleteRange}}
> This issue is follow-up for
> [FLINK-9070|https://issues.apache.org/jira/browse/FLINK-9070] where
> {{deleteRange}} was also considered. But at that time it implied slower read,
> it was buggy and not even available in the Java API of RocksDB. All of these
> problems were solved since that time (see also RocksDB [blog article for
> deleteRange|https://rocksdb.org/blog/2018/11/21/delete-range.html])
> Delete range enables to clear {{RocksDBMapState}} for one key in constant
> computational complexity whereas the old solution requires O(n ).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)