[ 
https://issues.apache.org/jira/browse/SPARK-32411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194485#comment-17194485
 ] 

Thomas Graves commented on SPARK-32411:
---------------------------------------

[~chitralverma] if you are still having an issue please file an issue in 
[https://github.com/NVIDIA/spark-rapids/issues]

> GPU Cluster Fail
> ----------------
>
>                 Key: SPARK-32411
>                 URL: https://issues.apache.org/jira/browse/SPARK-32411
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Web UI
>    Affects Versions: 3.0.0
>         Environment: Ihave a Apache Spark 3.0 cluster consisting of machines 
> with multiple nvidia-gpus and I connect my jupyter notebook to the cluster 
> using pyspark,
>            Reporter: Vinh Tran
>            Priority: Major
>
> I'm having a difficult time getting a GPU cluster started on Apache Spark 
> 3.0. It was hard to find documentation on this, but I stumbled on a NVIDIA 
> github page for Rapids which suggested the following additional edits to the 
> spark-defaults.conf:
> {code:java}
> spark.task.resource.gpu.amount 0.25
> spark.executor.resource.gpu.discoveryScript 
> ./usr/local/spark/getGpusResources.sh{code}
> I have a Apache Spark 3.0 cluster consisting of machines with multiple 
> nvidia-gpus and I connect my jupyter notebook to the cluster using pyspark, 
> however it results in the following error: 
> {code:java}
> Py4JJavaError: An error occurred while calling 
> None.org.apache.spark.api.java.JavaSparkContext.
> : org.apache.spark.SparkException: You must specify an amount for gpu
>       at 
> org.apache.spark.resource.ResourceUtils$.$anonfun$parseResourceRequest$1(ResourceUtils.scala:142)
>       at scala.collection.immutable.Map$Map1.getOrElse(Map.scala:119)
>       at 
> org.apache.spark.resource.ResourceUtils$.parseResourceRequest(ResourceUtils.scala:142)
>       at 
> org.apache.spark.resource.ResourceUtils$.$anonfun$parseAllResourceRequests$1(ResourceUtils.scala:159)
>       at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>       at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:75)
>       at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>       at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>       at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>       at 
> org.apache.spark.resource.ResourceUtils$.parseAllResourceRequests(ResourceUtils.scala:159)
>       at 
> org.apache.spark.SparkContext$.checkResourcesPerTask$1(SparkContext.scala:2773)
>       at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2884)
>       at org.apache.spark.SparkContext.<init>(SparkContext.scala:528)
>       at 
> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
>       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>       at py4j.Gateway.invoke(Gateway.java:238)
>       at 
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
>       at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
>       at py4j.GatewayConnection.run(GatewayConnection.java:238)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> After this, I then tried adding another line to the conf per the instructions 
> which results in no errors, however when I log in to the Web UI at 
> localhost:8080, under Running Applications, the state remains at waiting.
> {code:java}
> spark.task.resource.gpu.amount                  2
> spark.executor.resource.gpu.discoveryScript    
> ./usr/local/spark/getGpusResources.sh
> spark.executor.resource.gpu.amount              1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to