zhenyuT opened a new issue, #3618: URL: https://github.com/apache/incubator-streampark/issues/3618
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-streampark/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues. ### Java Version java version "1.8.0_181" Java(TM) SE Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode) ### Scala Version 2.12.x ### StreamPark Version 2.1.3 ### Flink Version 1.15.3 ### deploy mode kubernetes-application ### What happened k8s环境不稳定,导致flink pod自动重启,重启后flink任务恢复正常,但是streampark页面上显示任务状态为FAIL   查看代码和日志,发现主要问题在于,watch进程在进行flink任务状态监听时,请求flink web接口出现异常,再去请求k8s api server判断deployment是否存在的时候也发生了异常(etcd server leader发生了变更)。这里KubernetesRetriever.isDeploymentExists逻辑存在问题,发生异常时返回false,即认为deployment不存在,但是实际这时候deployment是存在的。由于错误的识别到deployment不存在,于是把任务状态更新为FAIL,并且终止了任务的监听,即使后面k8s恢复,streampark上的任务状态也不会恢复。 通过chaosblade工具模拟网络丢包,可以轻松的重现这个问题。 chaosblade文档见:https://chaosblade-io.gitbook.io/chaosblade-help-zh-cn/blade-create-network-loss 我这里的环境,k8s api server请求地址为https://172.18.64.242:8443, flink web接口的访问地址为https://172.18.64.242:443 于是我注入网络丢包的命令为:./blade create network loss --percent 100 --interface eth0 --remote-port 443,8443 --destination-ip 172.18.64.242 --timeout 200 简单解释下:这里是把请求到IP为172.18.64.242,端口为443和8443的网络请求丢包率设置为100%,持续200秒 启动flink任务后,再通过命令注入网络故障,再查看streampark页面,就会发现任务状态变为FAIL,即使后面网络故障恢复,任务状态也不会恢复为RUNNING(期间flink任务本身一直正常工作) ### Error Exception _No response_ ### Screenshots _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR!(您是否要贡献这个PR?) ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@streampark.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org