[jira] [Resolved] (SPARK-33780) YARN doesn't know about resource yarn.io/gpu

Bruno Faustino Amorim (Jira) Mon, 14 Dec 2020 18:33:36 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-33780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Bruno Faustino Amorim resolved SPARK-33780.
-------------------------------------------
    Resolution: Not A Bug

To use EMR you need to make configurations when creating the cluster. 
Documentation link: 
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-rapids.html

> YARN doesn't know about resource yarn.io/gpu
> --------------------------------------------
>
>                 Key: SPARK-33780
>                 URL: https://issues.apache.org/jira/browse/SPARK-33780
>             Project: Spark
>          Issue Type: Bug
>          Components: EC2
>    Affects Versions: 3.0.1
>         Environment: Amazon EMR: emr-6.2.0
>  Spark Version: Spark 3.0.1
> Instance Type: g3.4xlarge
>  AMI Name: emr-6_2_0-image-builder-ami-hvm-x86_64 2020-11-01T00-56-10.917Z
> Spark Configs:
> {code:java}
> sc_conf = SparkConf() \
>  .set('spark.driver.resource.gpu.discoveryScript', 
> '/opt/spark/getGpusResources.sh') \
>  .set('spark.driver.resource.gpu.amount', '1') \
>  .set('spark.rapids.sql.enabled', 'ALL'){code}
>  
>            Reporter: Bruno Faustino Amorim
>            Priority: Trivial
>
> Error to execute Spark on GPU. The stack trace is below:
> {code:java}
> 20/12/14 18:39:41 WARN ResourceRequestHelper: YARN doesn't know about 
> resource yarn.io/gpu, your resource discovery has to handle properly 
> discovering and isolating the resource! Error: The resource manager 
> encountered a problem that should not occur under normal circumstances. 
> Please report this error to the Hadoop community by opening a JIRA ticket at 
> http://issues.apache.org/jira and including the following 
> information:20/12/14 18:39:41 WARN ResourceRequestHelper: YARN doesn't know 
> about resource yarn.io/gpu, your resource discovery has to handle properly 
> discovering and isolating the resource! Error: The resource manager 
> encountered a problem that should not occur under normal circumstances. 
> Please report this error to the Hadoop community by opening a JIRA ticket at 
> http://issues.apache.org/jira and including the following information:* 
> Resource type requested: yarn.io/gpu* Resource object: <memory:896, 
> vCores:1>* The stack trace for this exception: java.lang.Exception at 
> org.apache.hadoop.yarn.exceptions.ResourceNotFoundException.<init>(ResourceNotFoundException.java:47)
>  at 
> org.apache.hadoop.yarn.api.records.Resource.getResourceInformation(Resource.java:268)
>  at 
> org.apache.hadoop.yarn.api.records.impl.pb.ResourcePBImpl.setResourceInformation(ResourcePBImpl.java:198)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.spark.deploy.yarn.ResourceRequestHelper$.$anonfun$setResourceRequests$4(ResourceRequestHelper.scala:183)
>  at scala.collection.immutable.Map$Map1.foreach(Map.scala:128) at 
> org.apache.spark.deploy.yarn.ResourceRequestHelper$.setResourceRequests(ResourceRequestHelper.scala:170)
>  at 
> org.apache.spark.deploy.yarn.Client.createApplicationSubmissionContext(Client.scala:277)
>  at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:196) 
> at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:60)
>  at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:201)
>  at org.apache.spark.SparkContext.<init>(SparkContext.scala:555) at 
> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) 
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
> py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at 
> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at 
> py4j.Gateway.invoke(Gateway.java:238) at 
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
>  at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at 
> py4j.GatewayConnection.run(GatewayConnection.java:238) at 
> java.lang.Thread.run(Thread.java:748)
> After encountering this error, the resource manager is in an inconsistent 
> state. It is safe for the resource manager to be restarted as the error 
> encountered should be transitive. If high availability is enabled, failing 
> over to a standby resource manager is also safe.20/12/14 18:39:46 WARN 
> YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors 
> before the AM has registered!{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-33780) YARN doesn't know about resource yarn.io/gpu

Reply via email to