[
https://issues.apache.org/jira/browse/FLINK-32953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hangxiang Yu reassigned FLINK-32953:
------------------------------------
Assignee: Jinzhong Li
> [State TTL]resolve data correctness problem after ttl was changed
> ------------------------------------------------------------------
>
> Key: FLINK-32953
> URL: https://issues.apache.org/jira/browse/FLINK-32953
> Project: Flink
> Issue Type: Bug
> Components: Runtime / State Backends
> Reporter: Jinzhong Li
> Assignee: Jinzhong Li
> Priority: Major
>
> Because expired data is cleaned up in background on a best effort basis
> (hashmap use INCREMENTAL_CLEANUP strategy, rocksdb use
> ROCKSDB_COMPACTION_FILTER strategy), some expired state is often persisted
> into snapshots.
>
> In some scenarios, user changes the state ttl of the job and then restore job
> from the old state. If the user adjust the state ttl from a short value to a
> long value (eg, from 12 hours to 24 hours), some expired data that was not
> cleaned up will be alive after restore. Obviously this is unreasonable, and
> may break data regulatory requirements.
>
> Particularly, rocksdb stateBackend may cause data correctness problems due to
> level compaction in this case.(eg. One key has two versions at level-1 and
> level-2,both of which are ttl expired. Then level-1 version is cleaned up by
> compaction, and level-2 version isn't. If we adjust state ttl and restart
> job, the incorrect data of level-2 will become valid after restore)
>
> To solve this problem, I think we can
> 1) persist old state ttl into snapshot meta info; (eg.
> RegisteredKeyValueStateBackendMetaInfo or others)
> 2) During state restore, check the size between the current ttl and old ttl;
> 3) If current ttl is longer than old ttl, we need to iterate over all data,
> filter out expired data with old ttl, and wirte valid data into stateBackend.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)