[ 
https://issues.apache.org/jira/browse/FLINK-32109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723208#comment-17723208
 ] 

Thomas Weise commented on FLINK-32109:
--------------------------------------

[~gyfora] if the issue can be corrected externally and eventually the 
deployment can transition into running state w/o intelligence in the operator 
then it is probably best to just wait? What would be useful though is to bubble 
up the event to the flinkdeployment level. Similar to genuine errors that we 
already interpret this would require special logic to recognize the specific 
condition. That needs to be added on a best effort basis and I think that is OK 
since it is mostly for convenience (saving the client to dig into the lower 
level resources).

> Operator doesn't recognize JobManager stuck on volumeMount startup errors
> -------------------------------------------------------------------------
>
>                 Key: FLINK-32109
>                 URL: https://issues.apache.org/jira/browse/FLINK-32109
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.5.0, kubernetes-operator-1.6.0
>            Reporter: Gyula Fora
>            Priority: Major
>
> Currently the flink deployment observer logic only reacts to Deployment 
> conditions such as failure to create the JM pod, image pull errors etc.
> Pod startup errors such as volumeMount are not recognized as errors and the 
> operator keeps waiting for it indefintitely.
> This is a tricky problem because volumeMount errors can be transient and are 
> only reported as Events for the pod so I am not completely sure if we can do 
> anything about this. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to