[ https://issues.apache.org/jira/browse/TEZ-4638 ]


    Dong0829 deleted comment on TEZ-4638:
    -------------------------------

was (Author: li0829):
Getting the AM UGI when creating this DAGClientAMProtocolBlockingPBServerImpl 
and this UGI has all the needed token which Tez AM container used, so if there 
is need to talk to HDFS, using the AM UGI instead if possible

 

 

Run big hive query(change the limit if the plan is small to reproduce the issue)

 

Before the fix

 
{quote}{{2025-06-30T10:18:18,760 INFO  [ce4666f9-a278-4f15-be97-ae59b727e14b 
main([])]: client.TezClient (:()) - Send dag plan using YARN local resources 
since it's too large, dag plan size=385547, max dag plan size through 
IPC=128974848, max IPC message size= 1342177282025-06-30T10:18:18,809 INFO  
[ce4666f9-a278-4f15-be97-ae59b727e14b main([])]: exec.Task (:()) - Dag submit 
failed due to DestHost:destPort ip-172-31-93-189.ec2.internal:8020 , 
LocalHost:localPort ip-172-31-93-68.ec2.internal/172.31.93.68:0. Failed on 
local exception: java.io.IOException: 
org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
via:[TOKEN, KERBEROS]        at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)        at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
        at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at 
java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
        at 
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)       
 at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:964)        at 
org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:939)        at 
org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1679)        at 
org.apache.hadoop.ipc.Client.call(Client.java:1620)
        at org.apache.hadoop.ipc.Client.call(Client.java:1517)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)}}
{quote}
After the fix

 
{quote}{{2025-06-30T10:34:52,975 INFO  [8f4c8a93-f6a9-4d6d-a813-cb946649815a 
main([])]: client.TezClient (:()) - Send dag plan using YARN local resources 
since it's too large, dag plan size=389516, max dag plan size through 
IPC=128974848, max IPC message size= 1342177282025-06-30T10:34:53,171 INFO  
[8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: client.FrameworkClient (:()) - 
Submitted dag to TezSession, 
sessionName=HIVE-8f4c8a93-f6a9-4d6d-a813-cb946649815a, 
applicationId=application_1751278111719_0008, dagId=dag_1751278111719_0008_1, 
dagName=select count(*) from drone_orders where ...0 
(Stage-1)2025-06-30T10:34:53,490 INFO  [8f4c8a93-f6a9-4d6d-a813-cb946649815a 
main([])]: SessionState (:()) - Status: Running (Executing on YARN cluster with 
App id application_1751278111719_0008)2025-06-30T10:34:53,505 INFO  
[8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: 
-/-  Reducer 2: 0/12025-06-30T10:34:56,532 INFO  
[8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: 
-/-  Reducer 2: 0/12025-06-30T10:34:57,541 INFO  
[8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: 
0/54 Reducer 2: 0/12025-06-30T10:35:00,565 INFO  
[8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: 
0/54 Reducer 2: 0/12025-06-30T10:35:01,070 INFO  
[8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: 
0(+3)/54     Reducer 2: 0/12025-06-30T10:35:02,084 INFO  
[8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: 
0(+5)/54     Reducer 2: 0/12025-06-30T10:35:02,589 INFO  
[8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: 
0(+7)/54     Reducer 2: 0/12025-06-30T10:35:03,598 INFO  
[8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: 
0(+9)/54     Reducer 2: 0/12025-06-30T10:35:04,103 INFO  
[8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: 
0(+11)/54    Reducer 2: 0/1}}
{quote}

> Client authenticate failure when using Kerberos if there is big DAG plan 
> needed HDFS
> ------------------------------------------------------------------------------------
>
>                 Key: TEZ-4638
>                 URL: https://issues.apache.org/jira/browse/TEZ-4638
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.10.2
>            Reporter: Dong0829
>            Priority: Major
>         Attachments: TEZ-4638.patch
>
>
> Whenever the DAG plan is big and exceed the limit, the DAG plan will be 
> uploaded to HDFS.  After TEZ AM gets this request, it will need to go to HDFS 
> to get the data, but in kerberos cluster, it will face below error:
> {quote}{{10.239.88.12:0. Failed on local exception: java.io.IOException: 
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[TOKEN, KERBEROS]
>     at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>  Method)
> ....
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:172)
>     at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:8519)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.processCall(ProtobufRpcEngine.java:484)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:595)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
>     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1226)
>     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1145)
>     at 
> java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
>     at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3388)}}
> {quote}
> For the RCA, its because the submitDAG request is handled by the RPC Sever, 
> and the hadoop server will use remote RPC client user as the current UGI 
> using doAs (as above stack)
> For the remote UGI, it has no context for the Tez AM which has the tokens 
> including KMS, HDFS and so on, so when it talking to the HDFS, it will fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to