[
https://issues.apache.org/jira/browse/SPARK-33780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bruno Faustino Amorim resolved SPARK-33780.
-------------------------------------------
Resolution: Not A Bug
To use EMR you need to make configurations when creating the cluster.
Documentation link:
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-rapids.html
> YARN doesn't know about resource yarn.io/gpu
> --------------------------------------------
>
> Key: SPARK-33780
> URL: https://issues.apache.org/jira/browse/SPARK-33780
> Project: Spark
> Issue Type: Bug
> Components: EC2
> Affects Versions: 3.0.1
> Environment: Amazon EMR: emr-6.2.0
> Spark Version: Spark 3.0.1
> Instance Type: g3.4xlarge
> AMI Name: emr-6_2_0-image-builder-ami-hvm-x86_64 2020-11-01T00-56-10.917Z
> Spark Configs:
> {code:java}
> sc_conf = SparkConf() \
> .set('spark.driver.resource.gpu.discoveryScript',
> '/opt/spark/getGpusResources.sh') \
> .set('spark.driver.resource.gpu.amount', '1') \
> .set('spark.rapids.sql.enabled', 'ALL'){code}
>
> Reporter: Bruno Faustino Amorim
> Priority: Trivial
>
> Error to execute Spark on GPU. The stack trace is below:
> {code:java}
> 20/12/14 18:39:41 WARN ResourceRequestHelper: YARN doesn't know about
> resource yarn.io/gpu, your resource discovery has to handle properly
> discovering and isolating the resource! Error: The resource manager
> encountered a problem that should not occur under normal circumstances.
> Please report this error to the Hadoop community by opening a JIRA ticket at
> http://issues.apache.org/jira and including the following
> information:20/12/14 18:39:41 WARN ResourceRequestHelper: YARN doesn't know
> about resource yarn.io/gpu, your resource discovery has to handle properly
> discovering and isolating the resource! Error: The resource manager
> encountered a problem that should not occur under normal circumstances.
> Please report this error to the Hadoop community by opening a JIRA ticket at
> http://issues.apache.org/jira and including the following information:*
> Resource type requested: yarn.io/gpu* Resource object: <memory:896,
> vCores:1>* The stack trace for this exception: java.lang.Exception at
> org.apache.hadoop.yarn.exceptions.ResourceNotFoundException.<init>(ResourceNotFoundException.java:47)
> at
> org.apache.hadoop.yarn.api.records.Resource.getResourceInformation(Resource.java:268)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ResourcePBImpl.setResourceInformation(ResourcePBImpl.java:198)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498) at
> org.apache.spark.deploy.yarn.ResourceRequestHelper$.$anonfun$setResourceRequests$4(ResourceRequestHelper.scala:183)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:128) at
> org.apache.spark.deploy.yarn.ResourceRequestHelper$.setResourceRequests(ResourceRequestHelper.scala:170)
> at
> org.apache.spark.deploy.yarn.Client.createApplicationSubmissionContext(Client.scala:277)
> at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:196)
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:60)
> at
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:201)
> at org.apache.spark.SparkContext.<init>(SparkContext.scala:555) at
> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at
> py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at
> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at
> py4j.Gateway.invoke(Gateway.java:238) at
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
> at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at
> py4j.GatewayConnection.run(GatewayConnection.java:238) at
> java.lang.Thread.run(Thread.java:748)
> After encountering this error, the resource manager is in an inconsistent
> state. It is safe for the resource manager to be restarted as the error
> encountered should be transitive. If high availability is enabled, failing
> over to a standby resource manager is also safe.20/12/14 18:39:46 WARN
> YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors
> before the AM has registered!{code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]