[ https://issues.apache.org/jira/browse/FLINK-29708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621318#comment-17621318 ]
Gyula Fora commented on FLINK-29708: ------------------------------------ Looks good! Maybe we could shorten operatorErrorType -> type I am a bit torn about httpResponseCode, we should probably only include it for specific error types. > Enrich Flink Kubernetes Operator CRD error field > ------------------------------------------------ > > Key: FLINK-29708 > URL: https://issues.apache.org/jira/browse/FLINK-29708 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator > Affects Versions: kubernetes-operator-1.3.0 > Reporter: Daren Wong > Assignee: Daren Wong > Priority: Major > Fix For: kubernetes-operator-1.3.0 > > > h1. Problem Statement: > FlinkDeployment and FlinkSessionJob CRD has a CommonStatus error field of > String type. Currently, this field stores various errors such as: > * CR validation error > * Missing SessionJob error/ Missing JobManager deployment error > * Unknown Job error > * DeploymentFailedException > * ReconciliationError such as RestClientException from Flink Internal such > as FlinkRest and FlinkRuntime > It is insufficient to store each error simply as string only. We need to > include some exception metadata to help operator handle this error > accordingly. For example, it is very useful to know the HttpResponseStatus > code from RestClientException. > h1. Proposed Solution: > * The error field should store a JSON with exception metadata. For example: > {code:java} > { > "operatorErrorType": "JobManagerNotFoundException", > "message": "JobManager with leadership ID: 1234 was not found", > "stackTrace": "JobManager lost connection at ....", > "httpResponseCode": "400" > } {code} > * The stackTrace field can be enabled or disabled via spec change. -- This message was sent by Atlassian Jira (v8.20.10#820010)