Dong0829 created TEZ-4638: ----------------------------- Summary: Client authenticate failure when using Kerberos if there is big DAG plan needed HDFS Key: TEZ-4638 URL: https://issues.apache.org/jira/browse/TEZ-4638 Project: Apache Tez Issue Type: Bug Affects Versions: 0.10.2 Reporter: Dong0829
Whenever the DAG plan is big and exceed the limit, the DAG plan will be uploaded to HDFS. After TEZ AM gets this request, it will need to go to HDFS to get the data, but in kerberos cluster, it will face below error: {quote}{{10.239.88.12:0. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) .... org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:172) at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:8519) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.processCall(ProtobufRpcEngine.java:484) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:595) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1226) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1145) at java.base/java.security.AccessController.doPrivileged(AccessController.java:712) at java.base/javax.security.auth.Subject.doAs(Subject.java:439) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3388)}} {quote} For the RCA, its because the submitDAG request is handled by the RPC Sever, and the hadoop server will use remote RPC client user as the current UGI using doAs (as above stack) For the remote UGI, it has no context for the Tez AM which has the tokens including KMS, HDFS and so on, so when it talking to the HDFS, it will fail. -- This message was sent by Atlassian Jira (v8.20.10#820010)