[
https://issues.apache.org/jira/browse/TEZ-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15455687#comment-15455687
]
Hitesh Shah commented on TEZ-3422:
----------------------------------
[[email protected]]
The code in question is trying to add the path configured for tez.lib.uris into
the YARN local resources (distributed cache). The local resources/distributed
cache have different visibility constraints and impact performance w.r.t how
many times a file will be downloaded onto a nodemanager where a Tez container
is launched.
{code}
LocalResourceVisibility lrVisibility;
if (checkAncestorPermissionsForAllUsers(conf, url.getFile(),
FsAction.EXECUTE) &&
fStatus.getPermission().getOtherAction().implies(FsAction.READ)) {
lrVisibility = LocalResourceVisibility.PUBLIC;
} else {
lrVisibility = LocalResourceVisibility.PRIVATE;
}
{code}
By checking if the file is public accessible by all users, we can then set the
tez tarball configured in tez.lib.uris to be set to visibility PUBLIC and there
by allow YARN to download it only once per machine for all Tez applications
that run.
I took a quick look at the MR codebase and it seems like that MR does not do
this check but rather relies on the visibility info to be set via configs for
some reason. \cc [~vinodkv] in case I am mistaken.
> TEZ performs EXECUTE on / (root directory)
> ------------------------------------------
>
> Key: TEZ-3422
> URL: https://issues.apache.org/jira/browse/TEZ-3422
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.0
> Environment: HDP 2.4, TEZ 0.7.0.2.4
> Reporter: Christoph Körner
> Priority: Critical
> Labels: security
>
> When scheduling a TEZ job via beeline on Yarn, TEZ performs an EXECUTE
> operation on HDFS in the directories /hdp/apps, /hdp and / (root directory).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)