masteryhx commented on PR #19142: URL: https://github.com/apache/flink/pull/19142#issuecomment-1091815704
> Thanks for the PR @masteryhx > > I have some concerns about approach taken, i.e. creating and destroying a full-fledged `ChangelogKeyedStateBackend`: (PCIIW) > > 1. `StateChangelogStorageFactory` might not be available after recovery (it is necessary to create changelog writer) > 2. Some parts of the Changelog backend initialized during the recovery will be lost once it's closed, for example `keyValueStatesByName`. As a result, the nested backend will re-create its state objects. > 3. Delegating functions will not be updating, IIUC (see `ChangelogKeyedStateBackend.functionDelegationHelper`) > > While running the tests, I see that after recovery no checkpoint (and likely processing) is performed; tasks immediately switch from `RUNNING` to `FINISHED`. > > An alternative approach would be to extract code responsible for applying changes and run it directly, without creating `ChangelogKeyedStateBackend`. Most of the code is already extracted in the form of `ChangelogBackendRestoreOperation` and ChangeAppliers. That seems more flexible and less fragile. WDYT? Thanks for the suggestion! I think the 1st problem also exists in your solution. For the 2nd problem, you are right and I have not found a great way to resolve it. So I just adopt your solution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
