[
https://issues.apache.org/jira/browse/FLINK-32109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723208#comment-17723208
]
Thomas Weise commented on FLINK-32109:
--------------------------------------
[~gyfora] if the issue can be corrected externally and eventually the
deployment can transition into running state w/o intelligence in the operator
then it is probably best to just wait? What would be useful though is to bubble
up the event to the flinkdeployment level. Similar to genuine errors that we
already interpret this would require special logic to recognize the specific
condition. That needs to be added on a best effort basis and I think that is OK
since it is mostly for convenience (saving the client to dig into the lower
level resources).
> Operator doesn't recognize JobManager stuck on volumeMount startup errors
> -------------------------------------------------------------------------
>
> Key: FLINK-32109
> URL: https://issues.apache.org/jira/browse/FLINK-32109
> Project: Flink
> Issue Type: Improvement
> Components: Kubernetes Operator
> Affects Versions: kubernetes-operator-1.5.0, kubernetes-operator-1.6.0
> Reporter: Gyula Fora
> Priority: Major
>
> Currently the flink deployment observer logic only reacts to Deployment
> conditions such as failure to create the JM pod, image pull errors etc.
> Pod startup errors such as volumeMount are not recognized as errors and the
> operator keeps waiting for it indefintitely.
> This is a tricky problem because volumeMount errors can be transient and are
> only reported as Events for the pod so I am not completely sure if we can do
> anything about this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)