[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225672#comment-14225672
 ] 

Arun Suresh commented on MAPREDUCE-6171:
----------------------------------------

[~dian.fu], Any reason why _yarn_ user is blacklisted from _DECRYPT_EEK_ calls 
? My understanding was that only the HDFS admin ie. the _hdfs_ user only needs 
to be blacklisted

> The visibilities of the distributed cache files and archives should be 
> determined by both their permissions and if they are located in HDFS 
> encryption zone
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6171
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6171
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: security
>            Reporter: Dian Fu
>
> The visibilities of the distributed cache files and archives are currently 
> determined by the permission of these files or archives. 
> The following is the logic of method isPublic() in class 
> ClientDistributedCacheManager:
> {code}
> static boolean isPublic(Configuration conf, URI uri,
>       Map<URI, FileStatus> statCache) throws IOException {
>     FileSystem fs = FileSystem.get(uri, conf);
>     Path current = new Path(uri.getPath());
>     //the leaf level file should be readable by others
>     if (!checkPermissionOfOther(fs, current, FsAction.READ, statCache)) {
>       return false;
>     }
>     return ancestorsHaveExecutePermissions(fs, current.getParent(), 
> statCache);
>   }
> {code}
> At NodeManager side, it will use "yarn" user to download public files and use 
> the user who submits the job to download private files. In normal cases, 
> there is no problem with this. However, if the files are located in an 
> encryption zone(HDFS-6134) and yarn user are configured to be disallowed to 
> fetch the DataEncryptionKey(DEK) of this encryption zone by KMS, the download 
> process of this file will fail. 
> You can reproduce this issue with the following steps (assume you submit job 
> with user "testUser"): 
> # create a clean cluster which has HDFS cryptographic FileSystem feature
> # create directory "/data/" in HDFS and make it as an encryption zone with 
> keyName "testKey"
> # configure KMS to only allow user "testUser" can decrypt DEK of key 
> "testKey" in KMS
> {code}
>   <property>
>     <name>key.acl.testKey.DECRYPT_EEK</name>
>     <value>testUser</value>
>   </property>
> {code}
> # execute job "teragen" with user "testUser":
> {code}
> su -s /bin/bash testUser -c "hadoop jar hadoop-mapreduce-examples*.jar 
> teragen 10000 /data/terasort-input" 
> {code}
> # execute job "terasort" with user "testUser":
> {code}
> su -s /bin/bash testUser -c "hadoop jar hadoop-mapreduce-examples*.jar 
> terasort /data/terasort-input /data/terasort-output"
> {code}
> You will see logs like this at the job submitter's console:
> {code}
> INFO mapreduce.Job: Job job_1416860917658_0002 failed with state FAILED due 
> to: Application application_1416860917658_0002 failed 2 times due to AM 
> Container for appattempt_1416860917658_0002_000002 exited with  exitCode: 
> -1000 due to: org.apache.hadoop.security.authorize.AuthorizationException: 
> User [yarn] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [testKey]!!
> {code}
> The initial idea to solve this issue is to modify the logic in 
> ClientDistributedCacheManager.isPublic to consider also whether this file is 
> in an encryption zone. If it is in an encryption zone, this file should be 
> considered as private. Then at NodeManager side, it will use user who submits 
> the job to fetch the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to