Canbin Zheng created FLINK-17090:
------------------------------------
Summary: Harden preckeck for the
KubernetesConfigOptions.JOB_MANAGER_CPU
Key: FLINK-17090
URL: https://issues.apache.org/jira/browse/FLINK-17090
Project: Flink
Issue Type: Improvement
Components: Deployment / Kubernetes
Affects Versions: 1.10.0
Reporter: Canbin Zheng
Fix For: 1.11.0
If people specify a negative value for the config option of
{{KubernetesConfigOptions#JOB_MANAGER_CPU}} as what the following command does,
{code:java}
./bin/kubernetes-session.sh -Dkubernetes.jobmanager.cpu=-3.0
-Dkubernetes.cluster-id=...{code}
then it will throw an exception as follows:
{quote}org.apache.flink.client.deployment.ClusterDeploymentException: Could not
create Kubernetes cluster "felix1".
at
org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:192)
at
org.apache.flink.kubernetes.KubernetesClusterDescriptor.deploySessionCluster(KubernetesClusterDescriptor.java:129)
at
org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:108)
at
org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:185)
at
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at
org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:185)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure
executing: POST at:
[https://cls-cf5wqdwy.ccs.tencent-cloud.com/apis/apps/v1/namespaces/default/deployments].
Message: Deployment.apps "felix1" is invalid:
[spec.template.spec.containers[0].resources.limits[cpu]: Invalid value: "-3":
must be greater than or equal to 0,
spec.template.spec.containers[0].resources.requests[cpu]: Invalid value: "-3":
must be greater than or equal to 0]. Received status: Status(apiVersion=v1,
code=422,
details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].resources.limits[cpu],
message=Invalid value: "-3": must be greater than or equal to 0,
reason=FieldValueInvalid, additionalProperties={}),
StatusCause(field=spec.template.spec.containers[0].resources.requests[cpu],
message=Invalid value: "-3": must be greater than or equal to 0,
reason=FieldValueInvalid, additionalProperties={})], group=apps,
kind=Deployment, name=felix1, retryAfterSeconds=null, uid=null,
additionalProperties={}), kind=Status, message=Deployment.apps "felix1" is
invalid: [spec.template.spec.containers[0].resources.limits[cpu]: Invalid
value: "-3": must be greater than or equal to 0,
spec.template.spec.containers[0].resources.requests[cpu]: Invalid value: "-3":
must be greater than or equal to 0], metadata=ListMeta(_continue=null,
resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid,
status=Failure, additionalProperties={}).
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:510)
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:449)
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:413)
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372)
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:241)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:798)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:328)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:324)
at
org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.createJobManagerComponent(Fabric8FlinkKubeClient.java:83)
at
org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:182)
{quote}
Since there is a gap in the configuration model between the flink-side and the
k8s-side, this ticket proposes to harden precheck in the flink k8s parameters
parsing tool and throw a more user-friendly exception message like "the value
of {{kubernetes.jobmanager.cpu}} must be greater than or equal to 0".
--
This message was sent by Atlassian Jira
(v8.3.4#803005)