[
https://issues.apache.org/jira/browse/FLINK-26079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490751#comment-17490751
]
Roman Khachatryan commented on FLINK-26079:
-------------------------------------------
[~dwysakowicz]
{quote}Do I understand it correctly, that the use case that breaks is basically
changing the state backend from a non-changelog to a changelog state backend?
{quote}
Yes. Recovering from a non-changelog checkpoint (not savepoint) is desirable.
The motivation is to reduce downtime.
[~pnowojski]
{quote}DataSourceTask is a legacy DataSet API class. We can safely limit
ourselves just to StreamTask.
StreamTask#createStateBackend or
StateBackendLoader#fromApplicationOrConfigOrDefault could be that one place.
{quote}
You're right regarding the DataSourceTask, I mistook it for FLIP-27 task.
However, state backend is also created by StreamOperatorContextBuilder (called
by operators). Shouldn't the check be there as well?
{quote}I don't like that we would have to pass the restore mode to implement
such temporary check, but I don't know what's the alternative?
{quote}
No, me neither. I'm not sure we should implement the validation.
I see the following alternatives:
1. Fix the original issue
2. Only document the limitation without enforcing it
3. Disallow recovery from non-changelog checkpoints (only allow savepoints as
Dawid mentioned)
As for fixing the original issue (cc: [~yunta]):
1. Register all state with the SharedStateRegistry. This would require changing
registerSharedStates() of at least KeyGroupsStateHandle and
IncrementalRemoteKeyedStateHandle
2. Limit the above to only initial checkpoint and only recovery
(CompletedCheckpoint.registerSharedStatesAfterRestored)
3. Wrap the materialized state with Changelog handles on JM, during recovery
(not an option IMO because JM shouldn't be aware of that)
> Disallow combination of Changelog backend with CLAIM restore mode when
> recovering from non-changelog checkpoint
> ---------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-26079
> URL: https://issues.apache.org/jira/browse/FLINK-26079
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Configuration, Runtime / State Backends
> Reporter: Roman Khachatryan
> Assignee: Roman Khachatryan
> Priority: Blocker
> Fix For: 1.15.0
>
>
> Extracted from FLINK-25872.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)