Ketan Doshi created SPARK-35284:
-----------------------------------
Summary: Kubernetes Fabric exception with Scala programs in Spark
3.x
Key: SPARK-35284
URL: https://issues.apache.org/jira/browse/SPARK-35284
Project: Spark
Issue Type: Bug
Components: Kubernetes
Affects Versions: 3.0.0
Environment: Docker Desktop v 3.2 on Windows 10. Kubernetes v1.19.7.
Apps are launched with the latest Spark Operator. Kafka with Confluent Platform
6.0
Reporter: Ketan Doshi
Exception occurs when running a small Scala app on Spark 3.x on Kubernetes.
Python programs work fine. The applications are launched using Spark Operator.
The app uses Spark Structured Streams and reads and writes JSON data to a Kafka
topic. This happens during development so only 5-10 small records are being
written, and the app doesn't run for more than 3-4 minutes.
This error is somewhat unpredictable but results in different failure scenarios
making Scala apps very unstable.
eg. Kafka read succeeds but Kafka write fails
eg. writes to Console or Memory don't work at all - no output is produced.
eg. Read from file stream and write to Kafka usually works
21/04/30 10:24:13 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0,
10.1.14.118, executor 1, partition 0, PROCESS_LOCAL, 8414 bytes)
21/04/30 10:24:19 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on
10.1.14.118:43469 (size: 9.5 KiB, free: 117.0 MiB)
21/04/30 10:24:53 ERROR Utils: Uncaught exception in thread
kubernetes-executor-pod-polling-sync
io.fabric8.kubernetes.client.KubernetesClientException: Operation: [list] for
kind: [Pod] with name: [null] in namespace: [spark-app] failed.
at
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
at
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.listRequestHelper(BaseOperation.java:155)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:621)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:70)
at
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsPollingSnapshotSource$PollRunnable.$anonfun$run$1(ExecutorPodsPollingSnapshotSource.scala:61)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1357)
at
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsPollingSnapshotSource$PollRunnable.run(ExecutorPodsPollingSnapshotSource.scala:56)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.runAndReset(Unknown Source)
at
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.net.SocketTimeoutException: timeout
at okio.Okio$4.newTimeoutException(Okio.java:232)
at okio.AsyncTimeout.exit(AsyncTimeout.java:285)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:241)
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:354)
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:226)
at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215)
at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
at
okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:134)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:109)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
at okhttp3.RealCall.execute(RealCall.java:93)
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:469)
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:412)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.listRequestHelper(BaseOperation.java:151)
... 11 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.base/java.net.SocketInputStream.socketRead0(Native Method)
at java.base/java.net.SocketInputStream.socketRead(Unknown Source)
at java.base/java.net.SocketInputStream.read(Unknown Source)
at java.base/java.net.SocketInputStream.read(Unknown Source)
at java.base/sun.security.ssl.SSLSocketInputRecord.read(Unknown Source)
at java.base/sun.security.ssl.SSLSocketInputRecord.readHeader(Unknown Source)
at
java.base/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(Unknown
Source)
at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(Unknown
Source)
at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(Unknown Source)
at okio.Okio$2.read(Okio.java:140)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
... 43 more
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]