[ 
https://issues.apache.org/jira/browse/TEZ-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated TEZ-4440:
-----------------------------
    Description: 
For hadoop version before YARN-8933. When tez app is running in yarn fed 
cluster, getAvailableResources may return null, then throw NPE.
{code:java}
2022-08-03 01:40:12,069 [ERROR] [AMRM Callback Handler Thread] 
|rm.YarnTaskSchedulerService|: Got Error from RMClient
java.lang.NullPointerException
    at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.fitsIn(YarnTaskSchedulerService.java:1445)
    at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.preemptIfNeeded(YarnTaskSchedulerService.java:1218)
    at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getProgress(YarnTaskSchedulerService.java:916)
    at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:428)
2022-08-03 01:40:12,075 [ERROR] [AMRM Callback Handler Thread] 
|yarn.YarnUncaughtExceptionHandler|: Thread Thread[AMRM Callback Handler 
Thread,5,main] threw an Exception.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.lang.NullPointerException
    at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:432)
Caused by: java.lang.NullPointerException
    at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.fitsIn(YarnTaskSchedulerService.java:1445)
    at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.preemptIfNeeded(YarnTaskSchedulerService.java:1218)
    at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getProgress(YarnTaskSchedulerService.java:916)
    at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:428){code}
In yarn federatiaon, AMRMProxy connect multi-rm in async way, so 
AllocateResponse::getAvailableResources may return null, then throw NPE.

In my PR, I replace Resource.Instance(0,0) to null. Because null may means yarn 
is busy, return 0 is reasonable. 

 

 

  was:
When tez app is running in yarn fed cluster, getAvailableResources may return 
null, then throw NPE.
{code:java}
2022-08-03 01:40:12,069 [ERROR] [AMRM Callback Handler Thread] 
|rm.YarnTaskSchedulerService|: Got Error from RMClient
java.lang.NullPointerException
    at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.fitsIn(YarnTaskSchedulerService.java:1445)
    at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.preemptIfNeeded(YarnTaskSchedulerService.java:1218)
    at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getProgress(YarnTaskSchedulerService.java:916)
    at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:428)
2022-08-03 01:40:12,075 [ERROR] [AMRM Callback Handler Thread] 
|yarn.YarnUncaughtExceptionHandler|: Thread Thread[AMRM Callback Handler 
Thread,5,main] threw an Exception.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.lang.NullPointerException
    at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:432)
Caused by: java.lang.NullPointerException
    at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.fitsIn(YarnTaskSchedulerService.java:1445)
    at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.preemptIfNeeded(YarnTaskSchedulerService.java:1218)
    at 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getProgress(YarnTaskSchedulerService.java:916)
    at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:428){code}
In yarn federatiaon, AMRMProxy connect multi-rm in async way, so 
AllocateResponse::getAvailableResources may return null, then throw NPE.

In my PR, I replace Resource.Instance(0,0) to null. Because null may means yarn 
is busy, return 0 is reasonable. 


> When tez app run in yarn fed cluster, may throw NPE
> ---------------------------------------------------
>
>                 Key: TEZ-4440
>                 URL: https://issues.apache.org/jira/browse/TEZ-4440
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: zhengchenyu
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> For hadoop version before YARN-8933. When tez app is running in yarn fed 
> cluster, getAvailableResources may return null, then throw NPE.
> {code:java}
> 2022-08-03 01:40:12,069 [ERROR] [AMRM Callback Handler Thread] 
> |rm.YarnTaskSchedulerService|: Got Error from RMClient
> java.lang.NullPointerException
>     at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.fitsIn(YarnTaskSchedulerService.java:1445)
>     at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.preemptIfNeeded(YarnTaskSchedulerService.java:1218)
>     at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getProgress(YarnTaskSchedulerService.java:916)
>     at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:428)
> 2022-08-03 01:40:12,075 [ERROR] [AMRM Callback Handler Thread] 
> |yarn.YarnUncaughtExceptionHandler|: Thread Thread[AMRM Callback Handler 
> Thread,5,main] threw an Exception.
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:432)
> Caused by: java.lang.NullPointerException
>     at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.fitsIn(YarnTaskSchedulerService.java:1445)
>     at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.preemptIfNeeded(YarnTaskSchedulerService.java:1218)
>     at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getProgress(YarnTaskSchedulerService.java:916)
>     at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:428){code}
> In yarn federatiaon, AMRMProxy connect multi-rm in async way, so 
> AllocateResponse::getAvailableResources may return null, then throw NPE.
> In my PR, I replace Resource.Instance(0,0) to null. Because null may means 
> yarn is busy, return 0 is reasonable. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to