[
https://issues.apache.org/jira/browse/FLINK-15534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011620#comment-17011620
]
Yang Wang edited comment on FLINK-15534 at 1/9/20 11:38 AM:
------------------------------------------------------------
After diving into the Yarn code, i found that it is known bug of Yarn. See
YARN-7007.
It has been fix in branch-2 of hadoop repository, however no new 2.8.x version
has been released after merging. I suggest to close it, since the NPE happens
in Yarn ResourceManager internally. We could not do anything in Flink.
After the new hadoop version for 2.8 is released(2.8.6), we need to bump the
flink-shaded-hadoop version to 2.8.6. If we use 2.9.x and 3.x hadoop instead,
it also works.
was (Author: fly_in_gis):
After diving into the Yarn code, i found that it is known bug of Yarn. See
[YARN-7007|https://issues.apache.org/jira/browse/YARN-7007].
It has been fix in branch-2 of hadoop repository, however no new 2.8.x version
has been released after merging. I suggest to close it, since the NPE happens
in Yarn ResourceManager internally. We could not do anything in Flink. After we
upgrade to the new hadoop version(2.9, 3.x), it will not be a problem.
> YARNSessionCapacitySchedulerITCase#perJobYarnClusterWithParallelism failed
> due to NPE
> -------------------------------------------------------------------------------------
>
> Key: FLINK-15534
> URL: https://issues.apache.org/jira/browse/FLINK-15534
> Project: Flink
> Issue Type: Bug
> Affects Versions: 1.11.0
> Reporter: Yu Li
> Priority: Blocker
>
> As titled, travis run fails with below error:
> {code}
> 07:29:22.417 [ERROR]
> perJobYarnClusterWithParallelism(org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase)
> Time elapsed: 16.263 s <<< ERROR!
> java.lang.NullPointerException:
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics.getAggregateAppResourceUsage(RMAppAttemptMetrics.java:128)
> at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getApplicationResourceUsageReport(RMAppAttemptImpl.java:900)
> at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:660)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:930)
> at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:273)
> at
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:507)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:847)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:790)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2486)
> at
> org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase.perJobYarnClusterWithParallelism(YARNSessionCapacitySchedulerITCase.java:405)
> Caused by: org.apache.hadoop.ipc.RemoteException:
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics.getAggregateAppResourceUsage(RMAppAttemptMetrics.java:128)
> at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getApplicationResourceUsageReport(RMAppAttemptImpl.java:900)
> at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:660)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:930)
> at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:273)
> at
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:507)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:847)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:790)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2486)
> at
> org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase.perJobYarnClusterWithParallelism(YARNSessionCapacitySchedulerITCase.java:405)
> {code}
> https://api.travis-ci.org/v3/job/634588108/log.txt
--
This message was sent by Atlassian Jira
(v8.3.4#803005)