[
https://issues.apache.org/jira/browse/FLINK-26274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507607#comment-17507607
]
Johnson Okorie commented on FLINK-26274:
----------------------------------------
Hi, I tested this feature from the master branch and it doesn't always work for
me. I tried to follow the same configurations above. I had 3 TMs with a
parallelism of 3 though (So 1 slot per TM). If I scale down and then up one TM
really quickly, it works fine. For a longer period (1+ minute), when I scale
the TMs back up to 3, I can see that the TM re-offers the previous slot. It
also seems the JM accepts the slot but nothing happens from there. After the
slot timeout, the slot gets released and the TM offers a new slot, triggering
recovery from remote storage.
[^taskmanager-2.log] (Grepped slot allocation related logs)
(I am still new to flink, so might be doing something very wrong)
> Test local recovery works across TaskManager process restarts
> -------------------------------------------------------------
>
> Key: FLINK-26274
> URL: https://issues.apache.org/jira/browse/FLINK-26274
> Project: Flink
> Issue Type: Technical Debt
> Components: Runtime / Coordination
> Affects Versions: 1.15.0
> Reporter: Till Rohrmann
> Assignee: Dawid Wysakowicz
> Priority: Blocker
> Labels: release-testing
> Fix For: 1.15.0
>
> Attachments: jobmanager_local_restore_2.log, taskmanager-2.log,
> taskmanager_flink-taskmanager-2_log
>
>
> This ticket is a testing task for
> [FLIP-201|https://cwiki.apache.org/confluence/x/wJuqCw].
> When enabling local recovery and configuring a working directory that can be
> re-read after a process failure, Flink should now be able to recover locally.
> We should test whether this is the case. Please take a look at the
> documentation [1, 2] to see how to configure Flink to make use of it.
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/working_directory/
> [2]
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/#enabling-local-recovery-across-pod-restarts
--
This message was sent by Atlassian Jira
(v8.20.1#820001)