[
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096037#comment-14096037
]
Charles Lamb commented on HADOOP-10919:
---------------------------------------
I'll update the HDFS-6509 doc to reflect the bit about trashing.
{quote}
1. src subtree and dst subtree do not have EZ - easy, same as today
{quote}
Agreed.
{quote}
2. src subtree has no EZ but dest does have EZ in a portion of its subtree.
Possible outcomes
1. if user performing operation has permissions in dest EZ then the files
within the dest EZ subtree are encrypted
{quote}
Agreed.
{quote}
2. src subtree has no EZ but dest does have EZ in a portion of its subtree.
Possible outcomes
...
2. if user does not (say Admin) what do we expect to happen?
{quote}
The behavior should be the same as what happens today: user (the admin) gets a
permission violation because the admin does not have access to the target.
{quote}
3. src subtree has EZ but dest does not. Possible outcomes
1. files copied as encrypted but cannot be decryptied at the dest since
it does not have an EZ zone- useful as a backup
{quote}
/.r/r: raw files are copied to dest so dest contains encrypted (and unreadable)
files
!/.r/r: files are decrypted by distcp and copied to dst (decrypted). Files are
readable because they have been decrypted during the copy.
{quote}
3. src subtree has EZ but dest does not. Possible outcomes
...
2. files copied as encrypted and a matching EZ is created automatically.
Can an admin do this operation since he does not have access to the keys?
{quote}
I don't think that distcp can, or should, create a matching EZ automatically.
It is too hard for it to know what the intent of the copy is. Should the new ez
have the same ez-key as the src ez or a different one? Sure, we could have an
option to let the user specify that, but for the first crack I wanted to keep
it fairly simple. So, the theory is that the admin creates the empty EZ before
performing the distcp. The admin can either set up the EZ with the same ez-key
as the src ez (call this "(a)" below, or the dest can have a different ez-key
than the src (call this "(b)" below. After the ez is created, then distcp will
try to maintain the files as encrypted. In either of those scenarios, there are
a couple of cases:
distcp with /.r/r: (a) works ok because the EDEKs for each file are copied from
src to dst. (b) does not work because when the files are opened in the dest
hierarchy, the EDEKs will be decrypted with the new ez-key(dst) and that won't
work. This could be made to work by having the KMS decrypt the EDEKs and
re-encrypt them with the new ez-key(dst), but it would assume that the distcp
invoker had proper credentials with the KMS for the keys. So in general this
scenario is only useful when the src-ez and the dst-ez have been setup with the
same ez-key. There are other issues with this that are discussed under
HDFS-6134, such as different key lengths, etc.
distcp with no /.r/r: Both of (a) and (b) work ok as long as the invoker has
access to the files that are being copied. distcp decrypts the files on read
and they get re-encrypted on write. This is pretty much the same as today.
{quote}
3. src subtree has EZ but dest does not. Possible outcomes
...
3. throw an error which can be overidden by a flag in which case the
files are decryoted and copied to in dest are left decrypted . This only works
if the user has permissions for decryption; admin cannot do this.
{quote}
/.r/r: The files aren't decrypted so this scenario is perfectly acceptable.
!/.r/r: As you say, the admin can't do this because they presumably don't have
access to the files on the src (and probably not on the target either). So this
scenario is about some random user doing a distcp of some subset of the tree on
their own. I think that what you're suggesting is a way of trying to keep the
user from shooting themselves in the foot by ensuring that they don't leave
unencrypted data hanging around in the dest. I can see this both ways. On the
one hand, someone has given the user access to the files and keys. They are
expected to "do the right thing" with the decrypted file contents, including
not putting it somewhere "unsafe". It is "transparent encryption" after all.
And they might actually want to leave it hanging around in unencrypted form
because (e.g.) maybe dst is on a cluster inside a SCIF and it's ok to leave the
files unencrypted.
But I think I like your suggestion that we throw an exception in this case
(user not using /.r/r, any of the source paths are in an ez, dest is not in an
ez) unless a flag is set.
{quote}
4. both src and dest have EZ at exactly the same part of the subtree.
Possible outcomes
1. If user has permission to decrypt and encrypt, then the data is copied
and encryption is redone with new keys,
2. If user does not have permission then ?? Fail or copy as raw?
{quote}
I think we should just treat this the same as current behavior. We just attempt
it and if it works great, and if not, we throw the exception. So I don't think
there's anything unusual here.
{quote}
5. both src and dest have EZ at different parts of the subtree. This should
reduce to 2 or 3.
{quote}
Agreed.
> Copy command should preserve raw.* namespace extended attributes
> ----------------------------------------------------------------
>
> Key: HADOOP-10919
> URL: https://issues.apache.org/jira/browse/HADOOP-10919
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 3.0.0
> Reporter: Charles Lamb
> Assignee: Charles Lamb
> Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)
>
> Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch
>
>
> Refer to the doc attached to HDFS-6509 for background.
> Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve
> extended attributes in the raw.* namespace by default whenever the src and
> target are in /.reserved/raw. To not preserve raw xattrs, don't specify
> /.reserved/raw in either the src or target.
--
This message was sent by Atlassian JIRA
(v6.2#6252)