[
https://issues.apache.org/jira/browse/FLINK-36673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17931839#comment-17931839
]
Alan Zhang commented on FLINK-36673:
------------------------------------
Thanks [~gyfora] . This fix seems to work for streaming jobs as well even it
was intended for fixing batch jobs, I want to test this patch. Is the version
1.11 released?
I wonder what is our release cadence for this operator, I didn't find related
information in operator docs. The latest release notes I found is for 1.10, I
didn't find one for 1.11:
https://flink.apache.org/2024/10/25/apache-flink-kubernetes-operator-1.10.0-release-announcement/
> Operator is not properly handling failed deployments without savepoints
> -----------------------------------------------------------------------
>
> Key: FLINK-36673
> URL: https://issues.apache.org/jira/browse/FLINK-36673
> Project: Flink
> Issue Type: Bug
> Components: Kubernetes Operator
> Reporter: Yaroslav Tkachenko
> Priority: Major
> Attachments: Screenshot 2025-02-28 at 4.15.26 PM.png, Screenshot
> 2025-02-28 at 8.51.37 PM.png, Screenshot 2025-02-28 at 8.55.36 PM.png,
> stacktrace.txt
>
>
> I noticed an issue after upgrading Flink Kubernetes Operator from 1.9 to 1.10.
> When I deploy a FlinkDeployment that fails during the startup, I get a
> "ReconciliationException: Could not observe latest savepoint information"
> (full stacktrace is attached).
> I think the issue was introduced here:
> [https://github.com/apache/flink-kubernetes-operator/pull/871.]
> *AbstractFlinkService.getLastCheckpoint* now throws a
> *ReconciliationException* when a savepoint is not available, and
> *SnapshotObserver.observeLatestCheckpoint* doesn't handle it properly. I
> think having no savepoint is completely normal in some situations (e.g. a
> brand new job).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)