[ 
https://issues.apache.org/jira/browse/HDFS-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149674#comment-16149674
 ] 

Xiao Chen edited comment on HDFS-12291 at 8/31/17 10:37 PM:
------------------------------------------------------------

Thanks [~surendrasingh] for the work. When writing re-encryption code I didn't 
think it would be reused, so glad to see the abstraction and reuse here!

High level comments:
- The most headache for re-encryption was renames. That's why HDFS-10899 is 
designed to be more like a maintenance feature and disable renames when the 
zone is under re-encryption. For this reason the default throttling is 1.0 (no 
throttling) and NN will try to process everything asap. (rename is separated 
out to a subjira, and not done currently) The difficulty here is to guarantee 
no inodes are lost in the iteration. Specifically this:
-# We're iterating A-Z. 
-# We hit the threshold at M, hence release the lock, process the batched work, 
then reacquire the lock
-# During the above, Y is renamed to B2.
-# We resume from children list (N), not knowing Y is renamed.
Some thoughts about rename (though a while ago) was:
-- We can track the renames separately, and process them in the next iteration. 
Also need to make sure they don't mess up with progress tracking.
-- If there are too many renames, we can abort the operation to prevent OOM.
-- Handling NN failover is the king of this headache, our options would be to 
either edit log the renames as well (and to make sure we do not exceed the 
[xattr size limit|https://issues.apache.org/jira/browse/HDFS-6344]), or start 
from the beginning. 

- This is purely from re-encryption and may not hold true for SPS. It turned 
out the performance bottleneck was to setup connection to the KMS for each 
EDEK, and I had to change the API to use be a batched interface, where a batch 
of EDEKs is sent over through 1 call. I think it would be beneficial to run 
some benchmarks early, so we know if SPS updating xattrs one by one to the NN 
would be troublesome. Hopefully not :) (i.e. what's the expected ops/sec to the 
NN from SPS?)

Detailed comments:
- Now that we're creating a generic {{FSTreeTraverser}} (nicely done!), feel 
free to update the javadoc / class structures as you see fit. No objections. :)
- Suggest to add a link in the class javadoc, to the traversal order javadoc on 
{{traverseDir}}. We should also speak out this is relying on the 
{{FSDirectory}}'s readlock, and how the throttling is done in general - only 
after each whole batch is done
- in {{ReencryptionHandler#processFileInode}} we had to type case 
{{traverseInfo}} to {{ZoneTraverseInfo}} for the {{equals}} comparison. An 
alternative would be to add a method on the base {{TraverseInfo}} class, and 
override it. (e.g. {{shouldAddINodeToBatch(Object)}} - I'm not good at naming 
though). I see {{FileInodeIdCollector}} also does similar comparison there, 
though not requiring a type case.
- As Uma pointed out, there are a few places where 're-encryption' or 'zone' is 
mentioned in the {{FSTreeTraverser}} class
- Can we change the name of 'rootId', since first reaction to root in NN is the 
[rootINode|https://github.com/apache/hadoop/blob/branch-3.0.0-alpha4/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeId.java#L36]
 aka {{/}}. Perhaps 'baseId', or 'startId'?


was (Author: xiaochen):
Thanks [~surendrasingh] for the work. When writing re-encryption code I didn't 
think it would be reused, so glad to see the abstraction and reuse here!

High level comments:
- The most headache for re-encryption was renames. That's why HDFS-10899 is 
designed to be more like a maintenance feature and disable renames when the 
zone is under re-encryption. (rename is separated out to a subjira, and not 
done currently) The difficulty here is to guarantee no inodes are lost in the 
iteration. Specifically this:
-# We're iterating A-Z. 
-# We hit the threshold at M, hence release the lock, process the batched work, 
then reacquire the lock
-# During the above, Y is renamed to B2.
-# We resume from children list (N), not knowing Y is renamed.
Some thoughts about rename (though a while ago) was:
-- We can track the renames separately, and process them in the next iteration. 
Also need to make sure they don't mess up with progress tracking.
-- If there are too many renames, we can abort the operation to prevent OOM.
-- Handling NN failover is the king of this headache, our options would be to 
either edit log the renames as well (and to make sure we do not exceed the 
[xattr size limit|https://issues.apache.org/jira/browse/HDFS-6344]), or start 
from the beginning. 

> [SPS]: Provide a mechanism to recursively iterate and satisfy storage policy 
> of all the files under the given dir
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-12291
>                 URL: https://issues.apache.org/jira/browse/HDFS-12291
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>            Reporter: Rakesh R
>            Assignee: Surendra Singh Lilhore
>         Attachments: HDFS-12291-HDFS-10285-01.patch, 
> HDFS-12291-HDFS-10285-02.patch
>
>
> For the given source path directory, presently SPS consider only the files 
> immediately under the directory(only one level of scanning) for satisfying 
> the policy. It WON’T do recursive directory scanning and then schedules SPS 
> tasks to satisfy the storage policy of all the files till the leaf node. 
> The idea of this jira is to discuss & implement an efficient recursive 
> directory iteration mechanism and satisfies storage policy for all the files 
> under the given directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to