[ 
https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923746#comment-16923746
 ] 

lioz nudel commented on SPARK-28921:
------------------------------------

Hi [~dongjoon], [~skonto]

We just encountered the same issue at Fyber and found out some details that 
might help

looks like AWS rolls this update on their EKS clusters, it didn't happened at 
once.

The issue started at 30.08.2019 on one of our clusters, one of our workarounds 
was migrating to other new cluster - which worked for several hours and then 
started to fail with the same error.

I've noticed that "fabric8-rbac" clusterrolebinding was changed exactly when 
the issue started on the new cluster.

Maybe you can try creating a cluster and watch "fabric8-rbac" for changes.

We still didn't get any approval from AWS side about this change.

 

I think you should add all spark versions up to 2.4.4.

Compiling older version with the new K8s client solved the issue, but there's a 
problem with 2.4.0 which can't be compiled with the new client dependency.

> Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 
> 1.12.10, 1.11.10)
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-28921
>                 URL: https://issues.apache.org/jira/browse/SPARK-28921
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 2.3.3, 2.4.3
>            Reporter: Paul Schweigert
>            Assignee: Andy Grove
>            Priority: Major
>             Fix For: 2.4.5, 3.0.0
>
>
> Spark jobs are failing on latest versions of Kubernetes when jobs attempt to 
> provision executor pods (jobs like Spark-Pi that do not launch executors run 
> without a problem):
>  
> Here's an example error message:
>  
> {code:java}
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: 
> HTTP 403, Status: 403 - 
> java.net.ProtocolException: Expected HTTP 101 response but was '403 
> Forbidden' 
>     at 
> okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) 
>     at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) 
>     at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) 
>     at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) 
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
>     at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Looks like the issue is caused by fixes for a recent CVE : 
> CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809]
> Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669]
>  
> Looks like upgrading kubernetes-client to 4.4.2 would solve this issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to