Nikolay Dimolarov created SPARK-31165:
-----------------------------------------
Summary: Multiple wrong references in Dockerfile for k8s
Key: SPARK-31165
URL: https://issues.apache.org/jira/browse/SPARK-31165
Project: Spark
Issue Type: Bug
Components: Kubernetes, Spark Core
Affects Versions: 3.0.0
Reporter: Nikolay Dimolarov
I am currently trying to follow the k8s instructions for Spark:
[https://spark.apache.org/docs/latest/running-on-kubernetes.html] and when I
clone apache/spark on GitHub on the master branch I saw multiple wrong folder
references after trying to build my Docker image:
*Issue 1: The comments in the Dockerfile state have the wrong folder for the
Dockerfile:*
{code:java}
# If this docker file is being used in the context of building your images from
a Spark # distribution, the docker build command should be invoked from the top
level directory # of the Spark distribution. E.g.: # docker build -t
spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .{code}
Well that docker build command simply won't run. I only got the following to
run:
{code:java}
docker build -t spark:latest -f
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile .
{code}
which is the actual path to the Dockerfile.
*Issue 2: jars folder does not exist*
After I read the tutorial I of course build spark first as per the instructions
with:
{code:java}
./build/mvn -Pkubernetes -DskipTests clean package{code}
Nonetheless, in the Dockerfile I get this error when building:
{code:java}
Step 8/18 : COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/
COPY failed: stat
/var/lib/docker/tmp/docker-builder638219776/kubernetes/dockerfiles/spark/entrypoint.sh:
no such file or directory{code}
for which I may have found a similar issue here:
[https://stackoverflow.com/questions/52451538/spark-for-kubernetes-test-on-mac]
I am new to Spark but I assume that this jars folder - if the build step would
actually make it and I ran the maven build of the master branch successfully
with the command I mentioned above - would exist in the root folder of the
project.
*Issue 3: missing entrypoint.sh and decom.sh due to wrong reference*
While Issue 2 remains unresolved as I can't wrap my head around the missing
jars folder (bin and sbin got copied successfully after I made a dummy jars
folder) I then got stuck on these 2 steps:
{code:java}
COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/ COPY
kubernetes/dockerfiles/spark/decom.sh /opt/{code}
with:
{code:java}
Step 8/18 : COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/
COPY failed: stat
/var/lib/docker/tmp/docker-builder638219776/kubernetes/dockerfiles/spark/entrypoint.sh:
no such file or directory{code}
which makes sense since the path should actually be:
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/decom.sh
*Remark*
I only created one issue since this seems like somebody cleaned up the repo and
forgot to change these. Am I missing something here? If I am, I apologise in
advance since I am new to the Spark project. I also saw that some of these
references were handled through vars in previous branches:
[https://github.com/apache/spark/blob/branch-2.4/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile]
(e.g. 2.4) but that also does not run out of the box.
I am also really not sure about the affected versions since that was not
transparent enough for me on GH - feel free to edit that field :)
I can also create a PR and change these but I need help with Issue 2 and the
jar files since I am not sure what the correct path for that one is. Would love
some help on this :)
Thanks in advance!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]