tomicooler opened a new pull request, #6120:
URL: https://github.com/apache/hadoop/pull/6120
The check introduced in YARN-10901 to avoid a warn message in NN logs in
certain situations (when /tmp/logs is not owned by the yarn user), but it adds
3 NameNode calls (create, setpermission, delete) during log aggregation
collection, for every NM.
Meaning, when a YARN job completes, at the YARN log aggregation phase this
check is done for every job, from every NodeManager.
In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster.
"write" calls need a Namesystem writeLock as well, so the impact is bigger.
Change-Id: I65468aa972860d3b62050fcb41b8b06e417ee8bb
<!--
Thanks for sending a pull request!
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
2. Make sure your PR title starts with JIRA issue id, e.g.,
'HADOOP-17799. Your PR title ...'.
-->
### Description of PR
Added a static concurrent cache that maps the `<FS class type + Log Path>`
to the check `result`.
**Two assumptions:**
- the permissions won't change while the NMs are running
- the key <FS class + Log Path> won't grow big
If these two assumptions are not met, we might need to come up with a
different idea.
### How was this patch tested?
Added some debug for demo:
```
final FsLogPathKey key = new FsLogPathKey(remoteFS.getClass(),
qualified);
FileSystem finalRemoteFS = remoteFS;
+ System.out.println("checking logdir " + qualified);
FS_CHMOD_CACHE.computeIfAbsent(key, k -> {
+ System.out.println(" computeIfAbsent " + qualified);
fsSupportsChmod = checkFsSupportsChmod(finalRemoteFS,
remoteRootLogDir, qualified);
return fsSupportsChmod;
});
```
Added some extra calls to `testRemoteDirCreationWithCustomUser`, the actual
checking is only done once:
```
checking logdir /tmp/logs
computeIfAbsent /tmp/logs
checking logdir /tmp/logs
checking logdir /tmp/logs
checking logdir /tmp/logs
```
### For code changes:
- [x] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'YARN-11578. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]