[ https://issues.apache.org/jira/browse/MAPREDUCE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947252#comment-13947252 ]
Chris Nauroth commented on MAPREDUCE-5809: ------------------------------------------ [~sureshms], yes, this is a very important consideration. Here is what I had in mind for the logic: # distcp CLI accepts a new optional flag: -pa for "preserve ACLs". The presence of -pa also implies the existing -pp flag, because ACLs are a super-set of permissions. # If preserving ACLs, then before submitting the job, distcp sends a canary {{getAclStatus}} request for / on the source and target file systems. This will detect ACL compatibility/support problems and fail fast before even submitting the job. There are three specific sub-cases that this check catches: ## File system is HDFS < 2.4, so the getAclStatus RPC endpoint doesn't exist. ## File system is HDFS >= 2.4, but ACLs are not enabled. ## File system is a {{FileSystem}} subclass that doesn't override the ACL APIs. ({{UnsupportedOperationException}}) # Then, distcp map tasks call {{getAclStatus}} and {{setAcl}} instead of {{setPermission}}. If the operator wishes to copy files with ACLs to a target cluster and just drop the ACLs, then they'd run without the -pa option, and distcp would continue to work as it does today. One additional thing I just learned while reading the code is that we support passing the -p flag with no additional arguments, and this is assumed to preserve replication, block size, user, group, permission and checksum type. I'm planning on preserving this behavior. I don't think we can include -pa as part of the defaults, because that could break existing deployments that are running distcp -p if they start using ACLs on one cluster but not the other. > Enhance distcp to support preserving HDFS ACLs. > ----------------------------------------------- > > Key: MAPREDUCE-5809 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5809 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp > Affects Versions: 2.4.0 > Reporter: Chris Nauroth > Assignee: Chris Nauroth > > This issue tracks enhancing distcp to add a new command-line argument for > preserving HDFS ACLs from the source at the copy destination. -- This message was sent by Atlassian JIRA (v6.2#6252)