[ 
https://issues.apache.org/jira/browse/FLINK-30475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653585#comment-17653585
 ] 

David Hrbacek commented on FLINK-30475:
---------------------------------------

Hello [~Yanfei Lei] [~masteryhx] ,

Sorry for the late response.

I do not expect, that performance results with {{deleteRange()}} changed since 
FLINK-28010.

But if I describe our use-case scenario:
  * Write extensive keyed map-state,
  * where the state basically represents a queue of elements waiting for 
processing.
  * This state is periodically (e.g.: after 3 days) cleared (with millions of 
items).

When {{MapState#clear()}} with iterator is called, then clearing of the state 
lasts for hours(!) with occasional checkpoint failing and big degradation of 
pipeline performance.
So little performance drop mentioned in 
[blog|https://rocksdb.org/blog/2018/11/21/delete-range.html] is not really a 
problem in comparison to {{clear()}} with scan and delete.
On the other hand, I admit that for most use-cases performance drop can be 
problematic.

As I understood the problem, then:
* {{deleteRange()}} is better for clearing map states with big amount of keys,
* whereas clear via scan and delete is optimal for map states with a low amount 
of keys.

This leads me to the solution proposal, where the {{RocksDBMapState#clear()}} 
method implementation can be chosen via configuration.

I can imagine two ways how to implement configurable clear operation:
* Switchable: e.g.: config {{state.backend.rocksdb.map-state.clear-op}} with 
default option {{scan-and-delete}} and optional {{delete-range}}.
* With threshold: e.g.: config 
{{state.backend.rocksdb.map-state.clear-with-delete-range-threshold}} where 
user can configure threshold of map state size from which will be used 
{{delete-range} }. And default value will be {{-1}} which means never use 
{{delete-range}} .

What do you think?
 

> Improved speed of RocksDBMapState clear() using rocksDB.deleteRange
> -------------------------------------------------------------------
>
>                 Key: FLINK-30475
>                 URL: https://issues.apache.org/jira/browse/FLINK-30475
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>    Affects Versions: 1.16.0
>            Reporter: David Hrbacek
>            Priority: Major
>              Labels: pull-request-available
>
> Currently {{RocksDBMapState#clear()}} is processed via keyRange traversing 
> and inserting particular keys into BatchWrite for deletion.
> RocksDb offer much faster way how to delete key range - {{deleteRange}}
> This issue is follow-up for 
> [FLINK-9070|https://issues.apache.org/jira/browse/FLINK-9070] where 
> {{deleteRange}} was also considered. But at that time it implied slower read, 
> it was buggy and not even available in the Java API of RocksDB. All of these 
> problems were solved since that time (see also RocksDB [blog article for 
> deleteRange|https://rocksdb.org/blog/2018/11/21/delete-range.html])
> Delete range enables to clear {{RocksDBMapState}} for one key in constant 
> computational complexity whereas the old solution requires O(n ).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to