[
https://issues.apache.org/jira/browse/FLINK-25458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Khachatryan updated FLINK-25458:
--------------------------------------
Description:
Currently, changelog state-backend doesn't support local recovery. Thus,
recovery times might be sub-optimal.
Materialized state issues:
Current periodic materialization would call state backend snapshot method with
a materialization id. However, current local state managment would rely on
checkpoint id as storing, confirming and discarding. The gap between them would
break how local recovery works.
Non-materialized state issues:
* non-materialized state (i.e. changelog) is shared across checkpoints, and
therefore needs some tracking (in TM or hard-linking in FS)
* the writer does not enforce boundary between checkpoints (when writing to
DFS); if local stream simply duplicates DFS stream then it would break on
cleanup
* files can be shared across tasks, which will also break on cleanup
was:
Currently, changelog state-backend doesn't support local recovery. Thus,
recovery times might be sub-optimal.
Current period materialization would call state backend snapshot method with a
materialization id. However, current local state managment would rely on
checkpoint id as storing, confirming and discarding. The gap between them would
break how local recovery works.
> Support local recovery
> ----------------------
>
> Key: FLINK-25458
> URL: https://issues.apache.org/jira/browse/FLINK-25458
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Checkpointing, Runtime / State Backends
> Reporter: Yun Tang
> Priority: Major
>
> Currently, changelog state-backend doesn't support local recovery. Thus,
> recovery times might be sub-optimal.
>
> Materialized state issues:
> Current periodic materialization would call state backend snapshot method
> with a materialization id. However, current local state managment would rely
> on checkpoint id as storing, confirming and discarding. The gap between them
> would break how local recovery works.
>
> Non-materialized state issues:
> * non-materialized state (i.e. changelog) is shared across checkpoints, and
> therefore needs some tracking (in TM or hard-linking in FS)
> * the writer does not enforce boundary between checkpoints (when writing to
> DFS); if local stream simply duplicates DFS stream then it would break on
> cleanup
> * files can be shared across tasks, which will also break on cleanup
--
This message was sent by Atlassian Jira
(v8.20.1#820001)