It may help to check this article of mine

Spark on Kubernetes, A Practitioner’s Guide


On Wed, 15 Feb 2023 at 09:12, Mich Talebzadeh

> Your submit command
> spark-submit --master k8s:// --deploy-mode
> cluster --name pyspark-example --conf 
> spark.kubernetes.container.image=pyspark-example:0.1
> --conf spark.kubernetes.file.upload.path=/myexample
> src/
> pay attention to what it says
> --conf spark.kubernetes.file.upload.path
> That refers to your Python package on GCS storage not in the docker itself
> From
> "... The app jar file will be uploaded to the S3 and then when the driver
> is launched it will be downloaded to the driver pod and will be added to
> its classpath. Spark will generate a subdir under the upload path with a
> random name to avoid conflicts with spark apps running in parallel. User
> could manage the subdirs created according to his needs..."
> In your case it is gs not s3
> There is no point putting your python file in the docker image itself!
On Wed, 15 Feb 2023 at 07:46, karan alang
>> Hi Ye,
>> This is the error i get when i don't set the
>> spark.kubernetes.file.upload.path
>> Any ideas on how to fix this ?
>> ```
>> Exception in thread "main" org.apache.spark.SparkException: Please
>> specify spark.kubernetes.file.upload.path property.
>> at
>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:299)
>> at
>> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:248)
>> at
>> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>> at
>> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>> at
>> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>> at
>> at$(TraversableLike.scala:231)
>> at
>> at
>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:247)
>> at
>> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:173)
>> at scala.collection.immutable.List.foreach(List.scala:392)
>> at
>> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:164)
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
>> at
>> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>> at
>> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>> at scala.collection.immutable.List.foldLeft(List.scala:89)
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>> at
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
>> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622)
>> at
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
>> at
>> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>> at
>> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> ```
On Tue, Feb 14, 2023 at 1:33 AM Ye Xianjin
>>> The configuration of ‘…file.upload.path’ is wrong. it means a
>>> distributed fs path to store your archives/resource/jars temporarily, then
>>> distributed by spark to drivers/executors.
>>> For your cases, you don’t need to set this configuration.
On Feb 14, 2023, at 5:43 AM, karan alang
>>> Hello All,
>>> I'm trying to run a simple application on GKE (Kubernetes), and it is
>>> failing:
>>> Note : I have spark(bitnami spark chart) installed on GKE using helm
>>> install
>>> Here is what is done :
>>> 1. created a docker image using Dockerfile
>>> Dockerfile :
>>> ```
>>> FROM python:3.7-slim
>>> RUN apt-get update && \
>>>     apt-get install -y default-jre && \
>>>     apt-get install -y openjdk-11-jre-headless && \
>>>     apt-get clean
>>> ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64
>>> RUN pip install pyspark
>>> RUN mkdir -p /myexample && chmod 755 /myexample
>>> WORKDIR /myexample
>>> COPY src/ /myexample/
>>> CMD ["pyspark"]
>>> ```
>>> Simple pyspark application :
>>> ```
>>> from pyspark.sql import SparkSession
>>> spark = 
>>> SparkSession.builder.appName("StructuredStreaming-on-gke").getOrCreate()
>>> data = [('k1', 123000), ('k2', 234000), ('k3', 456000)]
>>> df = spark.createDataFrame(data, ('id', 'salary'))
>>>, False)
>>> ```
>>> Spark-submit command :
>>> ```
>>> spark-submit --master k8s:// --deploy-mode
>>> cluster --name pyspark-example --conf
>>> spark.kubernetes.container.image=pyspark-example:0.1 --conf
>>> spark.kubernetes.file.upload.path=/myexample src/
>>> ```
>>> Error i get :
>>> ```
>>> 23/02/13 13:18:27 INFO KubernetesUtils: Uploading file:
>>> /Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/
>>> to dest:
>>> /myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a/
>>> Exception in thread "main" org.apache.spark.SparkException: Uploading
>>> file
>>> /Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/
>>> failed...
>>> at
>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:296)
>>> at
>>> org.apache.spark.deploy.k8s.KubernetesUtils$.renameMainAppResource(KubernetesUtils.scala:270)
>>> at
>>> org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configureForPython(DriverCommandFeatureStep.scala:109)
>>> at
>>> org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configurePod(DriverCommandFeatureStep.scala:44)
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:59)
>>> at
>>> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>>> at
>>> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>>> at scala.collection.immutable.List.foldLeft(List.scala:89)
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>>> at
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
>>> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622)
>>> at
>>> at
>>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
>>> at
>>> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>>> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>>> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>>> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>>> at
>>> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>> Caused by: org.apache.spark.SparkException: Error uploading file
>>> at
>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:319)
>>> at
>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:292)
>>> ... 21 more
>>> Caused by: Mkdirs failed to create
>>> /myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a
>>> at
>>> org.apache.hadoop.fs.RawLocalFileSystem.create(
>>> at
>>> org.apache.hadoop.fs.RawLocalFileSystem.create(
>>> at org.apache.hadoop.fs.FileSystem.create(
>>> at org.apache.hadoop.fs.FileSystem.create(
>>> at org.apache.hadoop.fs.FileUtil.copy(
>>> at org.apache.hadoop.fs.FileUtil.copy(
>>> at
>>> org.apache.hadoop.fs.FileSystem.copyFromLocalFile(
>>> at
>>> org.apache.hadoop.fs.FilterFileSystem.copyFromLocalFile(
>>> at
>>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:316)
>>> ... 22 more
>>> ```
>>> Any ideas on how to fix this & get it to work ?
