[jira] [Resolved] (SPARK-24547) Spark on K8s docker-image-tool.sh improvements

Anirudh Ramanathan (JIRA) Wed, 20 Jun 2018 17:13:01 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-24547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Anirudh Ramanathan resolved SPARK-24547.
----------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.4.0

> Spark on K8s docker-image-tool.sh improvements
> ----------------------------------------------
>
>                 Key: SPARK-24547
>                 URL: https://issues.apache.org/jira/browse/SPARK-24547
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes
>    Affects Versions: 2.4.0
>            Reporter: Ray Burgemeestre
>            Priority: Minor
>              Labels: docker, kubernetes, spark
>             Fix For: 2.4.0
>
>
> *Context*
> PySpark support for Spark on k8s was merged with 
> [https://github.com/apache/spark/pull/21092/files] few days ago
> There is a helper script that can be used to create docker containers to run 
> java and now also python jobs. It works like this:
> {{/path/to/docker-image-tool.sh -r node001:5000/brightcomputing -t v2.4.0 
> build}}
>  {{/path/to/docker-image-tool.sh -r node001:5000/brightcomputing -t v2.4.0 
> push}}
> *Problem*
> I ran into three two issues. First time I generated images for 2.4.0 Docker 
> was using it's cache, so actually when running jobs, old jars where still in 
> the Docker image. This produces errors like this in the executors:
> {code:java}
> 2018-06-13 10:27:52 INFO NettyBlockTransferService:54 - Server created on 
> 172.29.3.4:44877^M 2018-06-13 10:27:52 INFO BlockManager:54 - Using 
> org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
> policy^M 2018-06-13 10:27:52 INFO BlockManagerMaster:54 - Registering 
> BlockManager BlockManagerId(1, 172.29.3.4, 44877, None)^M 2018-06-13 10:27:52 
> ERROR CoarseGrainedExecutorBackend:91 - Executor self-exiting due to : Unable 
> to create executor due to Exception thrown in awaitResult: ^M 
> org.apache.spark.SparkException: Exception thrown in awaitResult: ^M ^Iat 
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)^M ^Iat 
> org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)^M ^Iat 
> org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)^M ^Iat 
> org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)^M ^Iat 
> org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:64)^M
>  ^Iat 
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:241)^M 
> ^Iat org.apache.spark.executor.Executor.<init>(Executor.scala:116)^M ^Iat 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:83)^M
>  ^Iat 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)^M
>  ^Iat org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)^M ^Iat 
> org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)^M ^Iat 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)^M 
> ^Iat 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)^M
>  ^Iat 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)^M
>  ^Iat java.lang.Thread.run(Thread.java:748)^M Caused by: 
> java.lang.RuntimeException: java.io.InvalidClassException: 
> org.apache.spark.storage.BlockManagerId; local class incompatible: stream 
> classdesc serialVersionUID = 6155820641931972169, local class 
> serialVersionUID = -3720498261147521051^M ^Iat 
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687)^M ^Iat 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1880)^M 
> ^Iat java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1746)^M
> {code}
> To avoid that Docker has to build without it's cache, but only if you have 
> build for an older version in the past...
> The second problem was that the spark container is pushed, but the spark-py 
> container wasn't yet. This was just forgotten in the initial PR.
> (A third problem I also ran into because I had an older docker was 
> [https://github.com/apache/spark/pull/21551] so I have not included a fix for 
> that in this ticket.)
> Other than that it works great!
> *Solution*
> I've added an extra flag so it's possible to call build with `-n` for 
> --no-cache`.
> And I've added the extra push for the spark-py container.
> *Example*
> ./bin/docker-image-tool.sh -r docker.io/myrepo -t v2.3.0 -n build
> Snippet from the help output:
> {code:java}
> Options:
> -f file Dockerfile to build for JVM based Jobs. By default builds the 
> Dockerfile shipped with Spark.
> -p file Dockerfile with Python baked in. By default builds the Dockerfile 
> shipped with Spark.
> -r repo Repository address.
> -t tag Tag to apply to the built image, or to identify the image to be pushed.
> -m Use minikube's Docker daemon.
> -n Build docker image with --no-cache{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24547) Spark on K8s docker-image-tool.sh improvements

Reply via email to