Myasuka commented on a change in pull request #15420:
URL: https://github.com/apache/flink/pull/15420#discussion_r657804934
##########
File path:
flink-state-backends/flink-statebackend-changelog/src/main/java/org/apache/flink/state/changelog/ChangelogKeyedStateBackend.java
##########
@@ -369,7 +509,73 @@ public void notifyCheckpointAborted(long checkpointId)
throws Exception {
// Factory function interface
private interface StateFactory {
<K, N, SV, S extends State, IS extends S> IS create(
- InternalKvState<K, N, SV> kvState, KvStateChangeLogger<SV, N>
changeLogger)
+ InternalKvState<K, N, SV> kvState,
+ KvStateChangeLogger<SV, N> changeLogger,
+ InternalKeyContext<K> keyContext)
throws Exception;
}
+
+ /**
+ * @param name state name
+ * @param type state type (the only supported type currently are: {@link
+ * BackendStateType#KEY_VALUE key value}, {@link
BackendStateType#PRIORITY_QUEUE priority
+ * queue})
+ * @return an existing state, i.e. the one that was already created
+ * @throws NoSuchElementException if the state wasn't created
+ * @throws UnsupportedOperationException if state type is not supported
+ */
+ public ChangelogState getExistingState(String name, BackendStateType type)
+ throws NoSuchElementException, UnsupportedOperationException {
+ ChangelogState state;
+ switch (type) {
+ case KEY_VALUE:
+ state = (ChangelogState) keyValueStatesByName.get(name);
Review comment:
Let me take `RocksDBKeyedStateBackend` for example.
Operations | Restoring state | User starts to call getOrCreateState for the
1st time
--|------------ | -------------
Normal RocksDBKeyedStateBackend|Restore from checkpoingts, not visiable for
users (not stored in `keyValueStatesByName`) | Might trigger state migration
and then visiable for users (stored in `keyValueStatesByName`)
Changelog (RocksDB) statebackend|Restore from checkpoingts, visiable for
users and then apply state changes ( (stored in `keyValueStatesByName`) | Just
get from `keyValueStatesByName` and cannot trigger state migration.
I think the first difference is that shall we make the state visiable for
users during restoring for changelog state backend? From my point of view, we
could still keep them not visiable so that we can then trigger state migration
for the 1st user access.
I agree that we could make state migration as a separate ticket, but I think
we should first discuss whether we should make previous state visiable in
`keyValueStatesByName` and that could be an easy fix in this PR.
If talking about state migration, I think we could have two choices:
1. Apply state changes during restore. And then migrate state in the 1st
user access. This might need data in changelog write twice.
2. Keep the state changes until user access the state during the 1st time.
Then we could apply state changes with possible state migration. This could
avoid uncessary re-write, but we need to keep those persist state changes.
Maybe we could have basic discussion here for these choices and leave TODO
in this PR if necessary.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]