[ 
https://issues.apache.org/jira/browse/FLINK-21321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joey Pereira updated FLINK-21321:
---------------------------------
    Description: 
In FLINK-8790, it was suggested to use RocksDB's {{deleteRange}} API to more 
efficiently clip the databases for the desired target group.

During the PR for that ticket, 
[#5582|https://github.com/apache/flink/pull/5582], the change did not end up 
using the {{deleteRange}} method  as it was an experimental feature in RocksDB.

At this point {{deleteRange}} is in a far less experimental state now but I 
believe is still formally "experimental". It is heavily by many others like 
CockroachDB and TiKV and they have teased out several bugs in complex 
interactions over the years.

For certain re-scaling situations where restores trigger {{restoreWithScaling}} 
and the DB clipping logic, this would likely reduce an O[n] operation (N = 
state size/records) to O(1). For large state apps, this would potentially 
represent a non-trivial amount of time spent for re-scaling. In the case of my 
workplace, we have an operator with 100s of billions of records in state and 
re-scaling was taking a long time (>>30min, but it has been awhile since doing 
it).

  was:
In FLINK-8790, it was suggested to use RocksDB's {{deleteRange}} API to more 
efficiently clip the databases for the desired target group.

During the PR for that ticket, 
[#5582|https://github.com/apache/flink/pull/5582], the change did not end up 
using the {{deleteRange}} method  as it was an experimental feature in RocksDB.

At this point {{deleteRange}} is in a far less experimental state now but I 
believe is still formally "experimental". It is heavily by many others like 
CockroachDB and TiKV and they have teased out several bugs in complex 
interactions over the years.

For certain re-scaling situations where restores trigger {{restoreWithScaling}} 
and the DB clipping logic, this would likely reduce an O(n) operation (N = 
state size/records) to O(1). For large state apps, this would potentially 
represent a non-trivial amount of time spent for re-scaling. In the case of my 
workplace, we have an operator with 100s of billions of records in state and 
re-scaling was taking a long time (>>30min, but it has been awhile since doing 
it).


> Change RocksDB incremental checkpoint re-scaling to use deleteRange
> -------------------------------------------------------------------
>
>                 Key: FLINK-21321
>                 URL: https://issues.apache.org/jira/browse/FLINK-21321
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>            Reporter: Joey Pereira
>            Priority: Minor
>
> In FLINK-8790, it was suggested to use RocksDB's {{deleteRange}} API to more 
> efficiently clip the databases for the desired target group.
> During the PR for that ticket, 
> [#5582|https://github.com/apache/flink/pull/5582], the change did not end up 
> using the {{deleteRange}} method  as it was an experimental feature in 
> RocksDB.
> At this point {{deleteRange}} is in a far less experimental state now but I 
> believe is still formally "experimental". It is heavily by many others like 
> CockroachDB and TiKV and they have teased out several bugs in complex 
> interactions over the years.
> For certain re-scaling situations where restores trigger 
> {{restoreWithScaling}} and the DB clipping logic, this would likely reduce an 
> O[n] operation (N = state size/records) to O(1). For large state apps, this 
> would potentially represent a non-trivial amount of time spent for 
> re-scaling. In the case of my workplace, we have an operator with 100s of 
> billions of records in state and re-scaling was taking a long time (>>30min, 
> but it has been awhile since doing it).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to