[
https://issues.apache.org/jira/browse/SPARK-40954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anton Ippolitov updated SPARK-40954:
------------------------------------
Attachment: TestProcess.scala
> Kubernetes integration tests stuck forever on Mac M1 with Minikube + Docker
> ---------------------------------------------------------------------------
>
> Key: SPARK-40954
> URL: https://issues.apache.org/jira/browse/SPARK-40954
> Project: Spark
> Issue Type: Bug
> Components: Kubernetes, Tests
> Affects Versions: 3.3.1
> Environment: MacOS 12.6 (Mac M1)
> Minikube 1.27.1
> Docker 20.10.17
> Reporter: Anton Ippolitov
> Priority: Minor
> Attachments: TestProcess.scala
>
>
> h2. Description
> I tried running Kubernetes integration tests with the Minikube backend (+
> Docker driver) from commit c26d99e3f104f6603e0849d82eca03e28f196551 on
> Spark's master branch. I ran them with the following command:
>
> {code:java}
> mvn integration-test -am -pl :spark-kubernetes-integration-tests_2.12 \
> -Pkubernetes -Pkubernetes-integration-tests \
> -Phadoop-3 \
> -Dspark.kubernetes.test.imageTag=MY_IMAGE_TAG_HERE \
> -Dspark.kubernetes.test.imageRepo=docker.io/kubespark
> \
> -Dspark.kubernetes.test.namespace=spark \
> -Dspark.kubernetes.test.serviceAccountName=spark \
> -Dspark.kubernetes.test.deployMode=minikube {code}
> However the test suite got stuck literally for hours on my machine.
>
> h2. Investigation
> I ran {{jstack}} on the process that was running the tests and saw that it
> was stuck here:
>
> {noformat}
> "ScalaTest-main-running-KubernetesSuite" #1 prio=5 os_prio=31
> tid=0x00007f78d580b800 nid=0x2503 runnable [0x0000000304749000]
> java.lang.Thread.State: RUNNABLE
> at java.io.FileInputStream.readBytes(Native Method)
> at java.io.FileInputStream.read(FileInputStream.java:255)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0x000000076c0b6f40> (a
> java.lang.UNIXProcess$ProcessPipeInputStream)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> - locked <0x000000076c0bb410> (a java.io.InputStreamReader)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:161)
> at java.io.BufferedReader.readLine(BufferedReader.java:324)
> - locked <0x000000076c0bb410> (a java.io.InputStreamReader)
> at java.io.BufferedReader.readLine(BufferedReader.java:389)
> at
> scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:74)
> at scala.collection.Iterator.foreach(Iterator.scala:943)
> at scala.collection.Iterator.foreach$(Iterator.scala:943)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> at
> org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.$anonfun$executeProcess$2(ProcessUtils.scala:45)
> at
> org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.$anonfun$executeProcess$2$adapted(ProcessUtils.scala:45)
> at
> org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$$$Lambda$322/20156341.apply(Unknown
> Source)
> at
> org.apache.spark.deploy.k8s.integrationtest.Utils$.tryWithResource(Utils.scala:49)
> at
> org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.executeProcess(ProcessUtils.scala:45)
> at
> org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.executeMinikube(Minikube.scala:103)
> at
> org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.minikubeServiceAction(Minikube.scala:112)
> at
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$getServiceUrl$1(DepsTestsSuite.scala:281)
> at
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite$$Lambda$611/1461360262.apply(Unknown
> Source)
> at
> org.scalatest.enablers.Retrying$$anon$4.makeAValiantAttempt$1(Retrying.scala:184)
> at
> org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:196)
> at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:226)
> at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:313)
> at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:312)
> at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:457)
> at
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.getServiceUrl(DepsTestsSuite.scala:278)
> at
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.tryDepsTest(DepsTestsSuite.scala:325)
> at
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160)
> at
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite$$Lambda$178/1750286943.apply$mcV$sp(Unknown
> Source)
> [...]{noformat}
> So the issue is coming from {{DepsTestsSuite}} when it is setting up
> {{{}minio{}}}. After [creating the minio StatefulSet and
> Service|https://github.com/apache/spark/blob/5ea2b386eb866e20540660cdb6ed43792cb29969/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L85],
> it
> [executes|https://github.com/apache/spark/blob/5ea2b386eb866e20540660cdb6ed43792cb29969/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L280-L281]
> the '{{{}minikube service -n spark minio-s3 --url'{}}} command. It then gets
> stuck in {{ProcessUtils}} while reading {{{}minikube{}}}'s stdout
> [here.|https://github.com/apache/spark/blob/c8b7a09d39bdbda1502a7580fe2b54b7cb0ac4e3/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ProcessUtils.scala#L44-L50]
> I then ran the same command from my shell and confirmed that it never returns
> until a CTRL+C:
> {noformat}
> $ minikube service -n spark minio-s3 --url
> http://127.0.0.1:63114
> ❗ Because you are using a Docker driver on darwin, the terminal needs to be
> open to run it.
> <COMMAND IS STILL RUNNING HERE>{noformat}
> So it looks like it's the normal behaviour for the 'minikube service' command
> on Mac with the Docker driver: it needs to keep an open tunnel. I had a quick
> look at Minikube's source code and it seems to be happening here:
> [https://github.com/kubernetes/minikube/blob/abed8b7d347ae15fe9c0acd91b5b49b3b6494a53/cmd/minikube/cmd/service.go#L154]
> It also seems to be confirmed by the docs:
> [https://minikube.sigs.k8s.io/docs/handbook/accessing/]
> So the code which reads from stdout hangs indefinitely because of that.
>
> I am not sure what would be the best solution here. I think ideally, we
> should run the 'minikube service' command, then retrieve the URL without
> blocking but at the same time we should make sure to leave the command
> running. When the {{DepsTestsSuite}} terminates, we shouldn't forget to
> terminate the minikube too.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]