[ https://issues.apache.org/jira/browse/TEZ-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15455687#comment-15455687 ]
Hitesh Shah commented on TEZ-3422: ---------------------------------- [~off...@chaosmail.at] The code in question is trying to add the path configured for tez.lib.uris into the YARN local resources (distributed cache). The local resources/distributed cache have different visibility constraints and impact performance w.r.t how many times a file will be downloaded onto a nodemanager where a Tez container is launched. {code} LocalResourceVisibility lrVisibility; if (checkAncestorPermissionsForAllUsers(conf, url.getFile(), FsAction.EXECUTE) && fStatus.getPermission().getOtherAction().implies(FsAction.READ)) { lrVisibility = LocalResourceVisibility.PUBLIC; } else { lrVisibility = LocalResourceVisibility.PRIVATE; } {code} By checking if the file is public accessible by all users, we can then set the tez tarball configured in tez.lib.uris to be set to visibility PUBLIC and there by allow YARN to download it only once per machine for all Tez applications that run. I took a quick look at the MR codebase and it seems like that MR does not do this check but rather relies on the visibility info to be set via configs for some reason. \cc [~vinodkv] in case I am mistaken. > TEZ performs EXECUTE on / (root directory) > ------------------------------------------ > > Key: TEZ-3422 > URL: https://issues.apache.org/jira/browse/TEZ-3422 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.7.0 > Environment: HDP 2.4, TEZ 0.7.0.2.4 > Reporter: Christoph Körner > Priority: Critical > Labels: security > > When scheduling a TEZ job via beeline on Yarn, TEZ performs an EXECUTE > operation on HDFS in the directories /hdp/apps, /hdp and / (root directory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)