Bruno Faustino Amorim created SPARK-33780:
---------------------------------------------
Summary: YARN doesn't know about resource yarn.io/gpu
Key: SPARK-33780
URL: https://issues.apache.org/jira/browse/SPARK-33780
Project: Spark
Issue Type: Bug
Components: EC2
Affects Versions: 3.0.1
Environment: Amazon EMR: emr-6.2.0
Spark Version: Spark 3.0.1
Instance Type: g3.4xlarge
AMI Name: emr-6_2_0-image-builder-ami-hvm-x86_64 2020-11-01T00-56-10.917Z
Spark Configs:
{code:java}
sc_conf = SparkConf() \
.set('spark.driver.resource.gpu.discoveryScript',
'/opt/spark/getGpusResources.sh') \
.set('spark.driver.resource.gpu.amount', '1') \
.set('spark.rapids.sql.enabled', 'ALL') \{code}
Reporter: Bruno Faustino Amorim
Error to execute Spark on GPU. The stack trace is below:
{code:java}
20/12/14 18:39:41 WARN ResourceRequestHelper: YARN doesn't know about resource
yarn.io/gpu, your resource discovery has to handle properly discovering and
isolating the resource! Error: The resource manager encountered a problem that
should not occur under normal circumstances. Please report this error to the
Hadoop community by opening a JIRA ticket at http://issues.apache.org/jira and
including the following information:20/12/14 18:39:41 WARN
ResourceRequestHelper: YARN doesn't know about resource yarn.io/gpu, your
resource discovery has to handle properly discovering and isolating the
resource! Error: The resource manager encountered a problem that should not
occur under normal circumstances. Please report this error to the Hadoop
community by opening a JIRA ticket at http://issues.apache.org/jira and
including the following information:* Resource type requested: yarn.io/gpu*
Resource object: <memory:896, vCores:1>* The stack trace for this exception:
java.lang.Exception at
org.apache.hadoop.yarn.exceptions.ResourceNotFoundException.<init>(ResourceNotFoundException.java:47)
at
org.apache.hadoop.yarn.api.records.Resource.getResourceInformation(Resource.java:268)
at
org.apache.hadoop.yarn.api.records.impl.pb.ResourcePBImpl.setResourceInformation(ResourcePBImpl.java:198)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.spark.deploy.yarn.ResourceRequestHelper$.$anonfun$setResourceRequests$4(ResourceRequestHelper.scala:183)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:128) at
org.apache.spark.deploy.yarn.ResourceRequestHelper$.setResourceRequests(ResourceRequestHelper.scala:170)
at
org.apache.spark.deploy.yarn.Client.createApplicationSubmissionContext(Client.scala:277)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:196) at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:60)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:201)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:555) at
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at
py4j.Gateway.invoke(Gateway.java:238) at
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at
py4j.GatewayConnection.run(GatewayConnection.java:238) at
java.lang.Thread.run(Thread.java:748)
After encountering this error, the resource manager is in an inconsistent
state. It is safe for the resource manager to be restarted as the error
encountered should be transitive. If high availability is enabled, failing over
to a standby resource manager is also safe.20/12/14 18:39:46 WARN
YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors
before the AM has registered!{code}
This exception happened when start Spark on GPU
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]