[
https://issues.apache.org/jira/browse/FLINK-37069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927631#comment-17927631
]
Weijie Guo commented on FLINK-37069:
------------------------------------
Hi [~Zakelly], I have tested this according to the instruction.
1. Checkout and compile flink in commit has: dd4bd434
2. Start a standalone flink cluster
3. Set `execution.checkpointing.externalized-checkpoint-retention:
RETAIN_ON_CANCELLATION` in flink conf
4. Run flink example
{code:java}
./bin/flink run ./examples/streaming/StateMachineExample.jar \
--backend forst \
--checkpoint-dir file:///cp \
--incremental-checkpoints true
{code}
5. Confirm checkpoint is triggered and completed, cancel this job
6. Restart from the latest cp
{code:java}
./bin/flink run -s file:///cp/ac252d10cfd0e70bc1142557f08132f4/chk-8
./examples/streaming/StateMachineExample.jar \
--backend forst \
--checkpoint-dir file:///cp \
--incremental-checkpoints true
{code}
But the job failed with the following exception:
{code:java}
Caused by: java.lang.IllegalArgumentException: Unsupported sharing files
strategy for
org.apache.flink.state.forst.snapshot.ForStIncrementalSnapshotStrategy : FORWARD
at
org.apache.flink.state.forst.snapshot.ForStIncrementalSnapshotStrategy.asyncSnapshot(ForStIncrementalSnapshotStrategy.java:146)
~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at
org.apache.flink.state.forst.snapshot.ForStIncrementalSnapshotStrategy.asyncSnapshot(ForStIncrementalSnapshotStrategy.java:70)
~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at
org.apache.flink.runtime.state.SnapshotStrategyRunner.snapshot(SnapshotStrategyRunner.java:80)
~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at
org.apache.flink.state.forst.ForStKeyedStateBackend.snapshot(ForStKeyedStateBackend.java:484)
~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
at
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:281)
~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
{code}
> Cross-team verification for "Disaggregated State Management"
> ------------------------------------------------------------
>
> Key: FLINK-37069
> URL: https://issues.apache.org/jira/browse/FLINK-37069
> Project: Flink
> Issue Type: Sub-task
> Reporter: Xintong Song
> Assignee: Weijie Guo
> Priority: Blocker
> Fix For: 2.0.0
>
>
> Instructions:
> First of all, please read the related documents briefly (still under review,
> will replace with formal links if merged):
> * Disaggregated State Management:
> [https://github.com/apache/flink/pull/26107/files#diff-bfa19e04bb5c3487c3e9bf514d61c0fa8bb973950fb0ad0e3d4a6898a99b83e3]
> * State V2:
> [https://github.com/apache/flink/pull/26107/files#diff-5d1147987fecbda329132403c1d92384575be220092995c4be491e12b8c50cc9]
> * ForSt State Backend:
> [https://github.com/apache/flink/pull/26107/files#diff-b7c52c06f6ed4d5af6f230d11ba23ea051bf4a08c589d98392143f080c468a87]
> For the SQL part, verification goes in FLINK-37068, we mainly focus on
> Datastream jobs and APIs here.
> 1. Make sure you are verifying this on release-2.0 branch, since we have
> fixed several bugs since the rc0 package.
> 2. Choose one example in `flink-examples-streaming`. Most of the jobs has
> been rewritten using new API. Here we take `StateMachineExample` as an
> example.
> 3. Compile and run `StateMachineExample` in proper environment (I suggest a
> standalone session cluster or yarn), make sure you have the following command
> line params:
> {code:bash}
> ./flink run xxxxxxxxx \
> --backend forst \
> --checkpoint-dir s3://your/cp/dir \
> --incremental-checkpoints true
> {code}
> Or set via `config.yaml`.
> {code:yaml}
> state.backend.type: forst
> execution.checkpointing.incremental: true
> execution.checkpointing.dir: s3://your-bucket/flink-checkpoints
> {code}
> 4. Check the job is running smoothly, the periodic checkpoints are
> successfully taken.
> 5. Stop the job and restart from the latest checkpoint.
> It would be great if you could write your own job using State V2 API, and
> follow the above Step 3~5. It is important to check whether there is any bug
> in new State APIs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)