[ 
https://issues.apache.org/jira/browse/FLINK-15534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011620#comment-17011620
 ] 

Yang Wang edited comment on FLINK-15534 at 1/9/20 11:38 AM:
------------------------------------------------------------

After diving into the Yarn code, i found that it is known bug of Yarn. See 
YARN-7007.

It has been fix in branch-2 of hadoop repository, however no new 2.8.x version 
has been released after merging. I suggest to close it, since the NPE happens 
in Yarn ResourceManager internally. We could not do anything in Flink.

After the new hadoop version for 2.8 is released(2.8.6), we need to bump the 
flink-shaded-hadoop version to 2.8.6. If we use 2.9.x and 3.x hadoop instead, 
it also works.


was (Author: fly_in_gis):
After diving into the Yarn code, i found that it is known bug of Yarn. See 
[YARN-7007|https://issues.apache.org/jira/browse/YARN-7007].

It has been fix in branch-2 of hadoop repository, however no new 2.8.x version 
has been released after merging. I suggest to close it, since the NPE happens 
in Yarn ResourceManager internally. We could not do anything in Flink. After we 
upgrade to the new hadoop version(2.9, 3.x), it will not be a problem.

> YARNSessionCapacitySchedulerITCase#perJobYarnClusterWithParallelism failed 
> due to NPE
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-15534
>                 URL: https://issues.apache.org/jira/browse/FLINK-15534
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.11.0
>            Reporter: Yu Li
>            Priority: Blocker
>
> As titled, travis run fails with below error:
> {code}
> 07:29:22.417 [ERROR] 
> perJobYarnClusterWithParallelism(org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase)
>   Time elapsed: 16.263 s  <<< ERROR!
> java.lang.NullPointerException: 
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics.getAggregateAppResourceUsage(RMAppAttemptMetrics.java:128)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getApplicationResourceUsageReport(RMAppAttemptImpl.java:900)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:660)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:930)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:273)
>       at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:507)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:847)
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:790)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2486)
>       at 
> org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase.perJobYarnClusterWithParallelism(YARNSessionCapacitySchedulerITCase.java:405)
> Caused by: org.apache.hadoop.ipc.RemoteException: 
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics.getAggregateAppResourceUsage(RMAppAttemptMetrics.java:128)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getApplicationResourceUsageReport(RMAppAttemptImpl.java:900)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:660)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:930)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:273)
>       at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:507)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:847)
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:790)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2486)
>       at 
> org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase.perJobYarnClusterWithParallelism(YARNSessionCapacitySchedulerITCase.java:405)
> {code}
> https://api.travis-ci.org/v3/job/634588108/log.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to