[jira] [Updated] (SPARK-33780) YARN doesn't know about resource yarn.io/gpu

Bruno Faustino Amorim (Jira) Mon, 14 Dec 2020 11:02:05 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-33780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Bruno Faustino Amorim updated SPARK-33780:
------------------------------------------
    Description: 
Error to execute Spark on GPU. The stack trace is below:
{code:java}
20/12/14 18:39:41 WARN ResourceRequestHelper: YARN doesn't know about resource 
yarn.io/gpu, your resource discovery has to handle properly discovering and 
isolating the resource! Error: The resource manager encountered a problem that 
should not occur under normal circumstances. Please report this error to the 
Hadoop community by opening a JIRA ticket at http://issues.apache.org/jira and 
including the following information:20/12/14 18:39:41 WARN 
ResourceRequestHelper: YARN doesn't know about resource yarn.io/gpu, your 
resource discovery has to handle properly discovering and isolating the 
resource! Error: The resource manager encountered a problem that should not 
occur under normal circumstances. Please report this error to the Hadoop 
community by opening a JIRA ticket at http://issues.apache.org/jira and 
including the following information:* Resource type requested: yarn.io/gpu* 
Resource object: <memory:896, vCores:1>* The stack trace for this exception: 
java.lang.Exception at 
org.apache.hadoop.yarn.exceptions.ResourceNotFoundException.<init>(ResourceNotFoundException.java:47)
 at 
org.apache.hadoop.yarn.api.records.Resource.getResourceInformation(Resource.java:268)
 at 
org.apache.hadoop.yarn.api.records.impl.pb.ResourcePBImpl.setResourceInformation(ResourcePBImpl.java:198)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.spark.deploy.yarn.ResourceRequestHelper$.$anonfun$setResourceRequests$4(ResourceRequestHelper.scala:183)
 at scala.collection.immutable.Map$Map1.foreach(Map.scala:128) at 
org.apache.spark.deploy.yarn.ResourceRequestHelper$.setResourceRequests(ResourceRequestHelper.scala:170)
 at 
org.apache.spark.deploy.yarn.Client.createApplicationSubmissionContext(Client.scala:277)
 at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:196) at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:60)
 at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:201) 
at org.apache.spark.SparkContext.<init>(SparkContext.scala:555) at 
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at 
py4j.Gateway.invoke(Gateway.java:238) at 
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) 
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at 
py4j.GatewayConnection.run(GatewayConnection.java:238) at 
java.lang.Thread.run(Thread.java:748)
After encountering this error, the resource manager is in an inconsistent 
state. It is safe for the resource manager to be restarted as the error 
encountered should be transitive. If high availability is enabled, failing over 
to a standby resource manager is also safe.20/12/14 18:39:46 WARN 
YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors 
before the AM has registered!{code}
 
 

  was:
Error to execute Spark on GPU. The stack trace is below:


{code:java}
20/12/14 18:39:41 WARN ResourceRequestHelper: YARN doesn't know about resource 
yarn.io/gpu, your resource discovery has to handle properly discovering and 
isolating the resource! Error: The resource manager encountered a problem that 
should not occur under normal circumstances. Please report this error to the 
Hadoop community by opening a JIRA ticket at http://issues.apache.org/jira and 
including the following information:20/12/14 18:39:41 WARN 
ResourceRequestHelper: YARN doesn't know about resource yarn.io/gpu, your 
resource discovery has to handle properly discovering and isolating the 
resource! Error: The resource manager encountered a problem that should not 
occur under normal circumstances. Please report this error to the Hadoop 
community by opening a JIRA ticket at http://issues.apache.org/jira and 
including the following information:* Resource type requested: yarn.io/gpu* 
Resource object: <memory:896, vCores:1>* The stack trace for this exception: 
java.lang.Exception at 
org.apache.hadoop.yarn.exceptions.ResourceNotFoundException.<init>(ResourceNotFoundException.java:47)
 at 
org.apache.hadoop.yarn.api.records.Resource.getResourceInformation(Resource.java:268)
 at 
org.apache.hadoop.yarn.api.records.impl.pb.ResourcePBImpl.setResourceInformation(ResourcePBImpl.java:198)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.spark.deploy.yarn.ResourceRequestHelper$.$anonfun$setResourceRequests$4(ResourceRequestHelper.scala:183)
 at scala.collection.immutable.Map$Map1.foreach(Map.scala:128) at 
org.apache.spark.deploy.yarn.ResourceRequestHelper$.setResourceRequests(ResourceRequestHelper.scala:170)
 at 
org.apache.spark.deploy.yarn.Client.createApplicationSubmissionContext(Client.scala:277)
 at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:196) at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:60)
 at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:201) 
at org.apache.spark.SparkContext.<init>(SparkContext.scala:555) at 
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at 
py4j.Gateway.invoke(Gateway.java:238) at 
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) 
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at 
py4j.GatewayConnection.run(GatewayConnection.java:238) at 
java.lang.Thread.run(Thread.java:748)
After encountering this error, the resource manager is in an inconsistent 
state. It is safe for the resource manager to be restarted as the error 
encountered should be transitive. If high availability is enabled, failing over 
to a standby resource manager is also safe.20/12/14 18:39:46 WARN 
YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors 
before the AM has registered!{code}

This exception happened  when start Spark on GPU


> YARN doesn't know about resource yarn.io/gpu
> --------------------------------------------
>
>                 Key: SPARK-33780
>                 URL: https://issues.apache.org/jira/browse/SPARK-33780
>             Project: Spark
>          Issue Type: Bug
>          Components: EC2
>    Affects Versions: 3.0.1
>         Environment: Amazon EMR: emr-6.2.0
>  Spark Version: Spark 3.0.1
> Instance Type: g3.4xlarge
>  AMI Name: emr-6_2_0-image-builder-ami-hvm-x86_64 2020-11-01T00-56-10.917Z
> Spark Configs:
> {code:java}
> sc_conf = SparkConf() \
>  .set('spark.driver.resource.gpu.discoveryScript', 
> '/opt/spark/getGpusResources.sh') \
>  .set('spark.driver.resource.gpu.amount', '1') \
>  .set('spark.rapids.sql.enabled', 'ALL'){code}
>  
>            Reporter: Bruno Faustino Amorim
>            Priority: Trivial
>
> Error to execute Spark on GPU. The stack trace is below:
> {code:java}
> 20/12/14 18:39:41 WARN ResourceRequestHelper: YARN doesn't know about 
> resource yarn.io/gpu, your resource discovery has to handle properly 
> discovering and isolating the resource! Error: The resource manager 
> encountered a problem that should not occur under normal circumstances. 
> Please report this error to the Hadoop community by opening a JIRA ticket at 
> http://issues.apache.org/jira and including the following 
> information:20/12/14 18:39:41 WARN ResourceRequestHelper: YARN doesn't know 
> about resource yarn.io/gpu, your resource discovery has to handle properly 
> discovering and isolating the resource! Error: The resource manager 
> encountered a problem that should not occur under normal circumstances. 
> Please report this error to the Hadoop community by opening a JIRA ticket at 
> http://issues.apache.org/jira and including the following information:* 
> Resource type requested: yarn.io/gpu* Resource object: <memory:896, 
> vCores:1>* The stack trace for this exception: java.lang.Exception at 
> org.apache.hadoop.yarn.exceptions.ResourceNotFoundException.<init>(ResourceNotFoundException.java:47)
>  at 
> org.apache.hadoop.yarn.api.records.Resource.getResourceInformation(Resource.java:268)
>  at 
> org.apache.hadoop.yarn.api.records.impl.pb.ResourcePBImpl.setResourceInformation(ResourcePBImpl.java:198)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.spark.deploy.yarn.ResourceRequestHelper$.$anonfun$setResourceRequests$4(ResourceRequestHelper.scala:183)
>  at scala.collection.immutable.Map$Map1.foreach(Map.scala:128) at 
> org.apache.spark.deploy.yarn.ResourceRequestHelper$.setResourceRequests(ResourceRequestHelper.scala:170)
>  at 
> org.apache.spark.deploy.yarn.Client.createApplicationSubmissionContext(Client.scala:277)
>  at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:196) 
> at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:60)
>  at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:201)
>  at org.apache.spark.SparkContext.<init>(SparkContext.scala:555) at 
> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) 
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
> py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at 
> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at 
> py4j.Gateway.invoke(Gateway.java:238) at 
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
>  at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at 
> py4j.GatewayConnection.run(GatewayConnection.java:238) at 
> java.lang.Thread.run(Thread.java:748)
> After encountering this error, the resource manager is in an inconsistent 
> state. It is safe for the resource manager to be restarted as the error 
> encountered should be transitive. If high availability is enabled, failing 
> over to a standby resource manager is also safe.20/12/14 18:39:46 WARN 
> YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors 
> before the AM has registered!{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-33780) YARN doesn't know about resource yarn.io/gpu

Reply via email to