Dian Fu created MAPREDUCE-6171:
----------------------------------

             Summary: The visibilities of the distributed cache files and 
archives should be determined by both their permissions and if they are located 
in HDFS encryption zone
                 Key: MAPREDUCE-6171
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6171
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: security
            Reporter: Dian Fu


The visibilities of the distributed cache files and archives are currently 
determined by the permission of these files or archives. 
The following is the logic of method isPublic() in class 
ClientDistributedCacheManager:
{code}
static boolean isPublic(Configuration conf, URI uri,
      Map<URI, FileStatus> statCache) throws IOException {
    FileSystem fs = FileSystem.get(uri, conf);
    Path current = new Path(uri.getPath());
    //the leaf level file should be readable by others
    if (!checkPermissionOfOther(fs, current, FsAction.READ, statCache)) {
      return false;
    }
    return ancestorsHaveExecutePermissions(fs, current.getParent(), statCache);
  }
{code}
At NodeManager side, it will use "yarn" user to download public files and use 
the user who submits the job to download private files. In normal cases, there 
is no problem with this. However, if the files are located in an encryption 
zone(HDFS-6134) and yarn user are configured to be disallowed to fetch the 
DataEncryptionKey(DEK) of this encryption zone by KMS, the download process of 
this file will fail. 

You can reproduce this issue with the following steps (assume you submit job 
with user "testUser"): 
# create a clean cluster which has HDFS cryptographic FileSystem feature
# create directory "/data/" in HDFS and make it as an encryption zone with 
keyName "testKey"
# configure KMS to only allow user "testUser" can decrypt DEK of key "testKey" 
in KMS
{code}
  <property>
    <name>key.acl.testKey.DECRYPT_EEK</name>
    <value>testUser</value>
  </property>
{code}
# execute job "teragen" with user "testUser":
{code}
su -s /bin/bash testUser -c "hadoop jar hadoop-mapreduce-examples*.jar teragen 
10000 /data/terasort-input" 
{code}
# execute job "terasort" with user "testUser":
{code}
su -s /bin/bash testUser -c "hadoop jar hadoop-mapreduce-examples*.jar terasort 
/data/terasort-input /data/terasort-output"
{code}

You will see logs like this at the job submitter's console:
{code}
INFO mapreduce.Job: Job job_1416860917658_0002 failed with state FAILED due to: 
Application application_1416860917658_0002 failed 2 times due to AM Container 
for appattempt_1416860917658_0002_000002 exited with  exitCode: -1000 due to: 
org.apache.hadoop.security.authorize.AuthorizationException: User [yarn] is not 
authorized to perform [DECRYPT_EEK] on key with ACL name [testKey]!!
{code}

The initial idea to solve this issue is to modify the logic in 
ClientDistributedCacheManager.isPublic to consider also whether this file is in 
an encryption zone. If it is in an encryption zone, this file should be 
considered as private.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to