[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947252#comment-13947252
 ] 

Chris Nauroth commented on MAPREDUCE-5809:
------------------------------------------

[~sureshms], yes, this is a very important consideration.  Here is what I had 
in mind for the logic:

# distcp CLI accepts a new optional flag: -pa for "preserve ACLs".  The 
presence of -pa also implies the existing -pp flag, because ACLs are a 
super-set of permissions.
# If preserving ACLs, then before submitting the job, distcp sends a canary 
{{getAclStatus}} request for / on the source and target file systems.  This 
will detect ACL compatibility/support problems and fail fast before even 
submitting the job.  There are three specific sub-cases that this check catches:
## File system is HDFS < 2.4, so the getAclStatus RPC endpoint doesn't exist.
## File system is HDFS >= 2.4, but ACLs are not enabled.
## File system is a {{FileSystem}} subclass that doesn't override the ACL APIs. 
 ({{UnsupportedOperationException}})
# Then, distcp map tasks call {{getAclStatus}} and {{setAcl}} instead of 
{{setPermission}}.

If the operator wishes to copy files with ACLs to a target cluster and just 
drop the ACLs, then they'd run without the -pa option, and distcp would 
continue to work as it does today.

One additional thing I just learned while reading the code is that we support 
passing the -p flag with no additional arguments, and this is assumed to 
preserve replication, block size, user, group, permission and checksum type.  
I'm planning on preserving this behavior.  I don't think we can include -pa as 
part of the defaults, because that could break existing deployments that are 
running distcp -p if they start using ACLs on one cluster but not the other.

> Enhance distcp to support preserving HDFS ACLs.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-5809
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5809
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distcp
>    Affects Versions: 2.4.0
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>
> This issue tracks enhancing distcp to add a new command-line argument for 
> preserving HDFS ACLs from the source at the copy destination.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to