[ 
https://issues.apache.org/jira/browse/TEZ-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560265#comment-14560265
 ] 

Hitesh Shah commented on TEZ-2488:
----------------------------------

The fix would be to never end up going to the RM for invalid data. On 
registering with the RM, it sends back RegisterApplicationMasterResponse in the 
registerAppMaster() call. The response object has getMaximumResourceCapability 
which can be used to do basic checks for resources being requested before 
making the request.

By doing this check in say DAG initialization we can fail the dag before making 
any allocation request calls to the RM. The check though would need to be done 
for all the vertices ( and the configured task settings ). If we enhance the 
VertexManager at some point, this check will need to be done everytime the 
VertexManager modifies the resources needed and throw an error back to the VM 
in such cases. 

 

> Tez AM crashes if a submitted DAG is configured to use invalid resource 
> sizes. 
> -------------------------------------------------------------------------------
>
>                 Key: TEZ-2488
>                 URL: https://issues.apache.org/jira/browse/TEZ-2488
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Priority: Critical
>         Attachments: applogs.txt
>
>
> 2015-05-26 21:54:03,485 ERROR [AMRM Heartbeater thread] 
> impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=682, maxMemory=512
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:249)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:226)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:234)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:98)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:505)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>       at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>       at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>       at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
>       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2015-05-26 21:54:03,495 INFO [Dispatcher thread: Central] app.DAGAppMaster: 
> Error in the TaskScheduler. Shutting down.
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=682, maxMemory=512
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:249)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:226)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:234)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:98)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:505)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>       at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to