zhengchenyu opened a new pull request, #5975:
URL: https://github.com/apache/hadoop/pull/5975

   
   ### Description of PR
   
   In order to avoid repeatedly passing NMToken to an Applicaiton, 
ResourceManager introduces NMTokenSecretManagerInRM, in which 
appAttemptToNodeKeyMap records which Nodes have applied for Token, here in the 
AppAttempt dimension. 
   For UAM, there is only one AppAttempt. Therefore, after UAM restarts, the 
previous NMToken will be lost. However, since 
NMTokenSecretManagerInRM::appAttemptToNodeKeyMap is not clear, the 
ResourceManager task will not resend the applied NMToken. So it will report the 
error that NMToken is lost. The specific errors are as follows:
   
   ```
   No NMToken sent for XX_HOST:XX_PORT 
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:262)
 
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.<init>(ContainerManagementProtocolProxy.java:252)
 
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:137)
 
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:433)
 
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:146)
 
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:394)
 
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
   at java.lang.Thread.run(Thread.java:748)
   ```
   
   ### How was this patch tested?
   
   unit test and test in real cluster.
   
   
   ### For code changes:
   
   For now, when the current UAM is re-registered, appAttemptToNodeKeyMap will 
be cleared only when there are transferredContainers. Just move the clear code 
forward.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to