Joey Pereira created FLINK-21321:
------------------------------------
Summary: Change RocksDB incremental checkpoint re-scaling to use
deleteRange
Key: FLINK-21321
URL: https://issues.apache.org/jira/browse/FLINK-21321
Project: Flink
Issue Type: Improvement
Components: Runtime / State Backends
Reporter: Joey Pereira
InĀ FLINK-8790, it was suggested to use RocksDB's {{deleteRange}} API to more
efficiently clip the databases for the desired target group.
During the PR for that ticket,
[#5582|https://github.com/apache/flink/pull/5582], the change did not end up
using the {{deleteRange}} method as it was an experimental feature in RocksDB.
At this point {{deleteRange}} is in a far less experimental state now but I
believe is still formally "experimental". It is heavily by many others like
CockroachDB and TiKV and they have teased out several bugs in complex
interactions over the years.
For certain re-scaling situations where restores trigger {{restoreWithScaling}}
and the DB clipping logic, this would likely reduce an O(n) operation (N =
state size/records) to O(1). For large state apps, this would potentially
represent a non-trivial amount of time spent for re-scaling. In the case of my
workplace, we have an operator with 100s of billions of records in state and
re-scaling was taking a long time (>>30min, but it has been awhile since doing
it).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)