[
https://issues.apache.org/jira/browse/FLINK-31203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693865#comment-17693865
]
hjw edited comment on FLINK-31203 at 2/27/23 8:03 AM:
------------------------------------------------------
[Gyula Fora|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=gyfora]
Could you help to look at this problem? thx.
was (Author: JIRAUSER280998):
@[Gyula
Fora|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=gyfora]
> Application upgrade rollbacks failed in Flink Kubernetes Operator
> -----------------------------------------------------------------
>
> Key: FLINK-31203
> URL: https://issues.apache.org/jira/browse/FLINK-31203
> Project: Flink
> Issue Type: Bug
> Components: Kubernetes Operator
> Affects Versions: kubernetes-operator-1.3.1
> Reporter: hjw
> Priority: Major
>
> I make a test on the Application upgrade rollback feature, but this function
> fails.The Flink application mode job cannot roll back to last stable spec.
> As shown in the follow example, I declare a error pod-template without a
> container named flink-main-container to test rollback feature.
> However, only the error of deploying the flink application job failed without
> rollback.
>
> Error:
> org.apache.flink.client.deployment.ClusterDeploymentException: Could not
> create Kubernetes cluster "basic-example".
> at
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:292)
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure
> executing: POST at:
> https://*/k8s/clusters/c-fwkxh/apis/apps/v1/namespaces/test-flink/deployments.
> Message: Deployment.apps "basic-example" is invalid:
> [spec.template.spec.containers[0].name: Required value,
> spec.template.spec.containers[0].image: Required value]. Received status:
> Status(apiVersion=v1, code=422,
> details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].name,
> message=Required value, reason=FieldValueRequired, additionalProperties={}),
> StatusCause(field=spec.template.spec.containers[0].image, message=Required
> value, reason=FieldValueRequired, additionalProperties={})], group=apps,
> kind=Deployment, name=basic-example, retryAfterSeconds=null, uid=null,
> additionalProperties={}), kind=Status, message=Deployment.apps
> "basic-example" is invalid: [spec.template.spec.containers[0].name: Required
> value, spec.template.spec.containers[0].image: Required value],
> metadata=ListMeta(_continue=null, remainingItemCount=null,
> resourceVersion=null, selfLink=null, additionalProperties={}),
> reason=Invalid, status=Failure, additionalProperties={}).
> at
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673)
> at
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)
> at
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560)
>
> Env:
> Flink version:Flink 1.16
> Flink Kubernetes Operator:1.3.1
>
> *Last* ** *stable spec:*
> apiVersion: [flink.apache.org/v1beta1|http://flink.apache.org/v1beta1]
> kind: FlinkDeployment
> metadata:
> name: basic-example
> spec:
> image: flink:1.16
> flinkVersion: v1_16
> flinkConfiguration:
> taskmanager.numberOfTaskSlots: "2"
> kubernetes.operator.deployment.rollback.enabled: true
> state.savepoints.dir: s3://flink-data/savepoints
> state.checkpoints.dir: s3://flink-data/checkpoints
> high-availability:
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
> high-availability.storageDir: s3://flink-data/ha
> serviceAccount: flink
> *podTemplate:*
> *spec:*
> *containers:*
> *- name: flink-main-container*
> *env:*
> *- name: TZ*
> *value: Asia/Shanghai*
> jobManager:
> resource:
> memory: "2048m"
> cpu: 1
> taskManager:
> resource:
> memory: "2048m"
> cpu: 1
> job:
> jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
> parallelism: 2
> upgradeMode: stateless
>
> *new Spec:*
> apiVersion: [flink.apache.org/v1beta1|http://flink.apache.org/v1beta1]
> kind: FlinkDeployment
> metadata:
> name: basic-example
> spec:
> image: flink:1.16
> flinkVersion: v1_16
> flinkConfiguration:
> taskmanager.numberOfTaskSlots: "2"
> kubernetes.operator.deployment.rollback.enabled: true
> state.savepoints.dir: s3://flink-data/savepoints
> state.checkpoints.dir: s3://flink-data/checkpoints
> high-availability:
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
> high-availability.storageDir: s3://flink-data/ha
> serviceAccount: flink
> *podTemplate:*
> *spec:*
> *containers:*
> *- env:*
> *- name: TZ*
> *value: Asia/Shanghai*
> jobManager:
> resource:
> memory: "2048m"
> cpu: 1
> taskManager:
> resource:
> memory: "2048m"
> cpu: 1
> job:
> jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
> parallelism: 2
> upgradeMode: stateless
--
This message was sent by Atlassian Jira
(v8.20.10#820010)