[ https://issues.apache.org/jira/browse/SPARK-35334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380390#comment-17380390 ]
Attila Zsolt Piros commented on SPARK-35334: -------------------------------------------- [~dongjoon] oh sorry this is feature I just left the issue type on the default. I try correct it! > Spark should be more resilient to intermittent K8s flakiness > ------------------------------------------------------------ > > Key: SPARK-35334 > URL: https://issues.apache.org/jira/browse/SPARK-35334 > Project: Spark > Issue Type: Bug > Components: Kubernetes > Affects Versions: 3.2.0 > Reporter: Attila Zsolt Piros > Assignee: Attila Zsolt Piros > Priority: Major > Fix For: 3.3.0 > > > Internal K8s errors such as an etcdserver leader election is propagated to > the API client and could cause serious issues in Spark, like: > {noformat} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: GET at: > https://kubernetes.default.svc/api/v1/namespaces/dex-app-bl24w4z9/pods/sparkpi-10-fcd3f6781a874212-driver. > Message: etcdserver: > leader changed. Received status: Status(apiVersion=v1, code=500, > details=null, kind=Status, message=etcdserver: leader changed, > metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, > status=Failure, additionalProperties={}). > {noformat} > First I try to fix in kubernetes-client by adding retries with exponential > backoff: > https://github.com/fabric8io/kubernetes-client/issues/3087 > If I manage it then this will could be just version update and introducing > some new configs in Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org