[ 
https://issues.apache.org/jira/browse/SPARK-40954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Ippolitov updated SPARK-40954:
------------------------------------
    Description: 
h2. Description

I tried running Kubernetes integration tests with the Minikube backend (+ 
Docker driver) from commit c26d99e3f104f6603e0849d82eca03e28f196551 on Spark's 
master branch. I ran them with the following command:

 
{code:java}
mvn integration-test -am -pl :spark-kubernetes-integration-tests_2.12 \
                        -Pkubernetes -Pkubernetes-integration-tests \
                        -Phadoop-3 \
                        -Dspark.kubernetes.test.imageTag=MY_IMAGE_TAG_HERE \
                        -Dspark.kubernetes.test.imageRepo=docker.io/kubespark \
                        -Dspark.kubernetes.test.namespace=spark \
                        -Dspark.kubernetes.test.serviceAccountName=spark \
                        -Dspark.kubernetes.test.deployMode=minikube  {code}
However the test suite got stuck literally for hours on my machine. 

 
h2. Investigation

I ran {{jstack}} on the process that was running the tests and saw that it was 
stuck here:

 
{noformat}
"ScalaTest-main-running-KubernetesSuite" #1 prio=5 os_prio=31 
tid=0x00007f78d580b800 nid=0x2503 runnable [0x0000000304749000]
   java.lang.Thread.State: RUNNABLE
    at java.io.FileInputStream.readBytes(Native Method)
    at java.io.FileInputStream.read(FileInputStream.java:255)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
    - locked <0x000000076c0b6f40> (a 
java.lang.UNIXProcess$ProcessPipeInputStream)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    - locked <0x000000076c0bb410> (a java.io.InputStreamReader)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at java.io.BufferedReader.fill(BufferedReader.java:161)
    at java.io.BufferedReader.readLine(BufferedReader.java:324)
    - locked <0x000000076c0bb410> (a java.io.InputStreamReader)
    at java.io.BufferedReader.readLine(BufferedReader.java:389)
    at 
scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:74)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at 
org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.$anonfun$executeProcess$2(ProcessUtils.scala:45)
    at 
org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.$anonfun$executeProcess$2$adapted(ProcessUtils.scala:45)
    at 
org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$$$Lambda$322/20156341.apply(Unknown
 Source)
    at 
org.apache.spark.deploy.k8s.integrationtest.Utils$.tryWithResource(Utils.scala:49)
    at 
org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.executeProcess(ProcessUtils.scala:45)
    at 
org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.executeMinikube(Minikube.scala:103)
    at 
org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.minikubeServiceAction(Minikube.scala:112)
    at 
org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$getServiceUrl$1(DepsTestsSuite.scala:281)
    at 
org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite$$Lambda$611/1461360262.apply(Unknown
 Source)
    at 
org.scalatest.enablers.Retrying$$anon$4.makeAValiantAttempt$1(Retrying.scala:184)
    at org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:196)
    at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:226)
    at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:313)
    at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:312)
    at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:457)
    at 
org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.getServiceUrl(DepsTestsSuite.scala:278)
    at 
org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.tryDepsTest(DepsTestsSuite.scala:325)
    at 
org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160)
    at 
org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite$$Lambda$178/1750286943.apply$mcV$sp(Unknown
 Source)
[...]{noformat}
 So the issue is coming from {{DepsTestsSuite}} when it is setting up 
{{{}minio{}}}. After [creating the minio StatefulSet and 
Service|https://github.com/apache/spark/blob/5ea2b386eb866e20540660cdb6ed43792cb29969/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L85],
 it 
[executes|https://github.com/apache/spark/blob/5ea2b386eb866e20540660cdb6ed43792cb29969/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L280-L281]
 the '{{{}minikube service -n spark minio-s3 --url'{}}} command. It then gets 
stuck in {{ProcessUtils}} while reading {{{}minikube{}}}'s stdout 
[here.|https://github.com/apache/spark/blob/c8b7a09d39bdbda1502a7580fe2b54b7cb0ac4e3/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ProcessUtils.scala#L44-L50]

I then ran the same command from my shell and confirmed that it never returns 
until a CTRL+C:
{noformat}
$ minikube service -n spark minio-s3 --url
http://127.0.0.1:63114
❗  Because you are using a Docker driver on darwin, the terminal needs to be 
open to run it.

<COMMAND IS STILL RUNNING HERE>{noformat}
So it looks like it's the normal behaviour for the 'minikube service' command 
on Mac with the Docker driver: it needs to keep an open tunnel. I had a quick 
look at Minikube's source code and it seems to be happening here: 
[https://github.com/kubernetes/minikube/blob/abed8b7d347ae15fe9c0acd91b5b49b3b6494a53/cmd/minikube/cmd/service.go#L154]

It also seems to be confirmed by the docs: 
[https://minikube.sigs.k8s.io/docs/handbook/accessing/] 

So the code which reads from stdout hangs indefinitely because of that. I was 
able to reproduce with a self-contained example as well, see attached 
{{TestProcess.scala}} file (it assumes that there is a {{minio-s3}} Service in 
the {{spark}} Namespace).

 

I am not sure what would be the best solution here. I think ideally, we should 
run the  'minikube service' command, then retrieve the URL without blocking but 
at the same time we should make sure to leave the command running. When the 
{{DepsTestsSuite}} terminates, we shouldn't forget to terminate the minikube 
too.

  was:
h2. Description

I tried running Kubernetes integration tests with the Minikube backend (+ 
Docker driver) from commit c26d99e3f104f6603e0849d82eca03e28f196551 on Spark's 
master branch. I ran them with the following command:

 
{code:java}
mvn integration-test -am -pl :spark-kubernetes-integration-tests_2.12 \
                        -Pkubernetes -Pkubernetes-integration-tests \
                        -Phadoop-3 \
                        -Dspark.kubernetes.test.imageTag=MY_IMAGE_TAG_HERE \
                        -Dspark.kubernetes.test.imageRepo=docker.io/kubespark \
                        -Dspark.kubernetes.test.namespace=spark \
                        -Dspark.kubernetes.test.serviceAccountName=spark \
                        -Dspark.kubernetes.test.deployMode=minikube  {code}
However the test suite got stuck literally for hours on my machine. 

 
h2. Investigation

I ran {{jstack}} on the process that was running the tests and saw that it was 
stuck here:

 
{noformat}
"ScalaTest-main-running-KubernetesSuite" #1 prio=5 os_prio=31 
tid=0x00007f78d580b800 nid=0x2503 runnable [0x0000000304749000]
   java.lang.Thread.State: RUNNABLE
    at java.io.FileInputStream.readBytes(Native Method)
    at java.io.FileInputStream.read(FileInputStream.java:255)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
    - locked <0x000000076c0b6f40> (a 
java.lang.UNIXProcess$ProcessPipeInputStream)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    - locked <0x000000076c0bb410> (a java.io.InputStreamReader)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at java.io.BufferedReader.fill(BufferedReader.java:161)
    at java.io.BufferedReader.readLine(BufferedReader.java:324)
    - locked <0x000000076c0bb410> (a java.io.InputStreamReader)
    at java.io.BufferedReader.readLine(BufferedReader.java:389)
    at 
scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:74)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at 
org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.$anonfun$executeProcess$2(ProcessUtils.scala:45)
    at 
org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.$anonfun$executeProcess$2$adapted(ProcessUtils.scala:45)
    at 
org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$$$Lambda$322/20156341.apply(Unknown
 Source)
    at 
org.apache.spark.deploy.k8s.integrationtest.Utils$.tryWithResource(Utils.scala:49)
    at 
org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.executeProcess(ProcessUtils.scala:45)
    at 
org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.executeMinikube(Minikube.scala:103)
    at 
org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.minikubeServiceAction(Minikube.scala:112)
    at 
org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$getServiceUrl$1(DepsTestsSuite.scala:281)
    at 
org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite$$Lambda$611/1461360262.apply(Unknown
 Source)
    at 
org.scalatest.enablers.Retrying$$anon$4.makeAValiantAttempt$1(Retrying.scala:184)
    at org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:196)
    at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:226)
    at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:313)
    at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:312)
    at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:457)
    at 
org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.getServiceUrl(DepsTestsSuite.scala:278)
    at 
org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.tryDepsTest(DepsTestsSuite.scala:325)
    at 
org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160)
    at 
org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite$$Lambda$178/1750286943.apply$mcV$sp(Unknown
 Source)
[...]{noformat}
 So the issue is coming from {{DepsTestsSuite}} when it is setting up 
{{{}minio{}}}. After [creating the minio StatefulSet and 
Service|https://github.com/apache/spark/blob/5ea2b386eb866e20540660cdb6ed43792cb29969/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L85],
 it 
[executes|https://github.com/apache/spark/blob/5ea2b386eb866e20540660cdb6ed43792cb29969/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L280-L281]
 the '{{{}minikube service -n spark minio-s3 --url'{}}} command. It then gets 
stuck in {{ProcessUtils}} while reading {{{}minikube{}}}'s stdout 
[here.|https://github.com/apache/spark/blob/c8b7a09d39bdbda1502a7580fe2b54b7cb0ac4e3/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ProcessUtils.scala#L44-L50]

I then ran the same command from my shell and confirmed that it never returns 
until a CTRL+C:
{noformat}
$ minikube service -n spark minio-s3 --url
http://127.0.0.1:63114
❗  Because you are using a Docker driver on darwin, the terminal needs to be 
open to run it.

<COMMAND IS STILL RUNNING HERE>{noformat}
So it looks like it's the normal behaviour for the 'minikube service' command 
on Mac with the Docker driver: it needs to keep an open tunnel. I had a quick 
look at Minikube's source code and it seems to be happening here: 
[https://github.com/kubernetes/minikube/blob/abed8b7d347ae15fe9c0acd91b5b49b3b6494a53/cmd/minikube/cmd/service.go#L154]

It also seems to be confirmed by the docs: 
[https://minikube.sigs.k8s.io/docs/handbook/accessing/] 

So the code which reads from stdout hangs indefinitely because of that.

 

I am not sure what would be the best solution here. I think ideally, we should 
run the  'minikube service' command, then retrieve the URL without blocking but 
at the same time we should make sure to leave the command running. When the 
{{DepsTestsSuite}} terminates, we shouldn't forget to terminate the minikube 
too.


> Kubernetes integration tests stuck forever on Mac M1 with Minikube + Docker
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-40954
>                 URL: https://issues.apache.org/jira/browse/SPARK-40954
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes, Tests
>    Affects Versions: 3.3.1
>         Environment: MacOS 12.6 (Mac M1)
> Minikube 1.27.1
> Docker 20.10.17
>            Reporter: Anton Ippolitov
>            Priority: Minor
>         Attachments: TestProcess.scala
>
>
> h2. Description
> I tried running Kubernetes integration tests with the Minikube backend (+ 
> Docker driver) from commit c26d99e3f104f6603e0849d82eca03e28f196551 on 
> Spark's master branch. I ran them with the following command:
>  
> {code:java}
> mvn integration-test -am -pl :spark-kubernetes-integration-tests_2.12 \
>                         -Pkubernetes -Pkubernetes-integration-tests \
>                         -Phadoop-3 \
>                         -Dspark.kubernetes.test.imageTag=MY_IMAGE_TAG_HERE \
>                         -Dspark.kubernetes.test.imageRepo=docker.io/kubespark 
> \
>                         -Dspark.kubernetes.test.namespace=spark \
>                         -Dspark.kubernetes.test.serviceAccountName=spark \
>                         -Dspark.kubernetes.test.deployMode=minikube  {code}
> However the test suite got stuck literally for hours on my machine. 
>  
> h2. Investigation
> I ran {{jstack}} on the process that was running the tests and saw that it 
> was stuck here:
>  
> {noformat}
> "ScalaTest-main-running-KubernetesSuite" #1 prio=5 os_prio=31 
> tid=0x00007f78d580b800 nid=0x2503 runnable [0x0000000304749000]
>    java.lang.Thread.State: RUNNABLE
>     at java.io.FileInputStream.readBytes(Native Method)
>     at java.io.FileInputStream.read(FileInputStream.java:255)
>     at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>     at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>     - locked <0x000000076c0b6f40> (a 
> java.lang.UNIXProcess$ProcessPipeInputStream)
>     at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>     at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>     at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>     - locked <0x000000076c0bb410> (a java.io.InputStreamReader)
>     at java.io.InputStreamReader.read(InputStreamReader.java:184)
>     at java.io.BufferedReader.fill(BufferedReader.java:161)
>     at java.io.BufferedReader.readLine(BufferedReader.java:324)
>     - locked <0x000000076c0bb410> (a java.io.InputStreamReader)
>     at java.io.BufferedReader.readLine(BufferedReader.java:389)
>     at 
> scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:74)
>     at scala.collection.Iterator.foreach(Iterator.scala:943)
>     at scala.collection.Iterator.foreach$(Iterator.scala:943)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.$anonfun$executeProcess$2(ProcessUtils.scala:45)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.$anonfun$executeProcess$2$adapted(ProcessUtils.scala:45)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$$$Lambda$322/20156341.apply(Unknown
>  Source)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.Utils$.tryWithResource(Utils.scala:49)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.executeProcess(ProcessUtils.scala:45)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.executeMinikube(Minikube.scala:103)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.minikubeServiceAction(Minikube.scala:112)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$getServiceUrl$1(DepsTestsSuite.scala:281)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite$$Lambda$611/1461360262.apply(Unknown
>  Source)
>     at 
> org.scalatest.enablers.Retrying$$anon$4.makeAValiantAttempt$1(Retrying.scala:184)
>     at 
> org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:196)
>     at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:226)
>     at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:313)
>     at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:312)
>     at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:457)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.getServiceUrl(DepsTestsSuite.scala:278)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.tryDepsTest(DepsTestsSuite.scala:325)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite$$Lambda$178/1750286943.apply$mcV$sp(Unknown
>  Source)
> [...]{noformat}
>  So the issue is coming from {{DepsTestsSuite}} when it is setting up 
> {{{}minio{}}}. After [creating the minio StatefulSet and 
> Service|https://github.com/apache/spark/blob/5ea2b386eb866e20540660cdb6ed43792cb29969/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L85],
>  it 
> [executes|https://github.com/apache/spark/blob/5ea2b386eb866e20540660cdb6ed43792cb29969/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L280-L281]
>  the '{{{}minikube service -n spark minio-s3 --url'{}}} command. It then gets 
> stuck in {{ProcessUtils}} while reading {{{}minikube{}}}'s stdout 
> [here.|https://github.com/apache/spark/blob/c8b7a09d39bdbda1502a7580fe2b54b7cb0ac4e3/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ProcessUtils.scala#L44-L50]
> I then ran the same command from my shell and confirmed that it never returns 
> until a CTRL+C:
> {noformat}
> $ minikube service -n spark minio-s3 --url
> http://127.0.0.1:63114
> ❗  Because you are using a Docker driver on darwin, the terminal needs to be 
> open to run it.
> <COMMAND IS STILL RUNNING HERE>{noformat}
> So it looks like it's the normal behaviour for the 'minikube service' command 
> on Mac with the Docker driver: it needs to keep an open tunnel. I had a quick 
> look at Minikube's source code and it seems to be happening here: 
> [https://github.com/kubernetes/minikube/blob/abed8b7d347ae15fe9c0acd91b5b49b3b6494a53/cmd/minikube/cmd/service.go#L154]
> It also seems to be confirmed by the docs: 
> [https://minikube.sigs.k8s.io/docs/handbook/accessing/] 
> So the code which reads from stdout hangs indefinitely because of that. I was 
> able to reproduce with a self-contained example as well, see attached 
> {{TestProcess.scala}} file (it assumes that there is a {{minio-s3}} Service 
> in the {{spark}} Namespace).
>  
> I am not sure what would be the best solution here. I think ideally, we 
> should run the  'minikube service' command, then retrieve the URL without 
> blocking but at the same time we should make sure to leave the command 
> running. When the {{DepsTestsSuite}} terminates, we shouldn't forget to 
> terminate the minikube too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to