[
https://issues.apache.org/jira/browse/MAPREDUCE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225672#comment-14225672
]
Arun Suresh commented on MAPREDUCE-6171:
----------------------------------------
[~dian.fu], Any reason why _yarn_ user is blacklisted from _DECRYPT_EEK_ calls
? My understanding was that only the HDFS admin ie. the _hdfs_ user only needs
to be blacklisted
> The visibilities of the distributed cache files and archives should be
> determined by both their permissions and if they are located in HDFS
> encryption zone
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-6171
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6171
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: security
> Reporter: Dian Fu
>
> The visibilities of the distributed cache files and archives are currently
> determined by the permission of these files or archives.
> The following is the logic of method isPublic() in class
> ClientDistributedCacheManager:
> {code}
> static boolean isPublic(Configuration conf, URI uri,
> Map<URI, FileStatus> statCache) throws IOException {
> FileSystem fs = FileSystem.get(uri, conf);
> Path current = new Path(uri.getPath());
> //the leaf level file should be readable by others
> if (!checkPermissionOfOther(fs, current, FsAction.READ, statCache)) {
> return false;
> }
> return ancestorsHaveExecutePermissions(fs, current.getParent(),
> statCache);
> }
> {code}
> At NodeManager side, it will use "yarn" user to download public files and use
> the user who submits the job to download private files. In normal cases,
> there is no problem with this. However, if the files are located in an
> encryption zone(HDFS-6134) and yarn user are configured to be disallowed to
> fetch the DataEncryptionKey(DEK) of this encryption zone by KMS, the download
> process of this file will fail.
> You can reproduce this issue with the following steps (assume you submit job
> with user "testUser"):
> # create a clean cluster which has HDFS cryptographic FileSystem feature
> # create directory "/data/" in HDFS and make it as an encryption zone with
> keyName "testKey"
> # configure KMS to only allow user "testUser" can decrypt DEK of key
> "testKey" in KMS
> {code}
> <property>
> <name>key.acl.testKey.DECRYPT_EEK</name>
> <value>testUser</value>
> </property>
> {code}
> # execute job "teragen" with user "testUser":
> {code}
> su -s /bin/bash testUser -c "hadoop jar hadoop-mapreduce-examples*.jar
> teragen 10000 /data/terasort-input"
> {code}
> # execute job "terasort" with user "testUser":
> {code}
> su -s /bin/bash testUser -c "hadoop jar hadoop-mapreduce-examples*.jar
> terasort /data/terasort-input /data/terasort-output"
> {code}
> You will see logs like this at the job submitter's console:
> {code}
> INFO mapreduce.Job: Job job_1416860917658_0002 failed with state FAILED due
> to: Application application_1416860917658_0002 failed 2 times due to AM
> Container for appattempt_1416860917658_0002_000002 exited with exitCode:
> -1000 due to: org.apache.hadoop.security.authorize.AuthorizationException:
> User [yarn] is not authorized to perform [DECRYPT_EEK] on key with ACL name
> [testKey]!!
> {code}
> The initial idea to solve this issue is to modify the logic in
> ClientDistributedCacheManager.isPublic to consider also whether this file is
> in an encryption zone. If it is in an encryption zone, this file should be
> considered as private. Then at NodeManager side, it will use user who submits
> the job to fetch the file.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)