[
https://issues.apache.org/jira/browse/HDFS-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149674#comment-16149674
]
Xiao Chen edited comment on HDFS-12291 at 8/31/17 10:37 PM:
------------------------------------------------------------
Thanks [~surendrasingh] for the work. When writing re-encryption code I didn't
think it would be reused, so glad to see the abstraction and reuse here!
High level comments:
- The most headache for re-encryption was renames. That's why HDFS-10899 is
designed to be more like a maintenance feature and disable renames when the
zone is under re-encryption. For this reason the default throttling is 1.0 (no
throttling) and NN will try to process everything asap. (rename is separated
out to a subjira, and not done currently) The difficulty here is to guarantee
no inodes are lost in the iteration. Specifically this:
-# We're iterating A-Z.
-# We hit the threshold at M, hence release the lock, process the batched work,
then reacquire the lock
-# During the above, Y is renamed to B2.
-# We resume from children list (N), not knowing Y is renamed.
Some thoughts about rename (though a while ago) was:
-- We can track the renames separately, and process them in the next iteration.
Also need to make sure they don't mess up with progress tracking.
-- If there are too many renames, we can abort the operation to prevent OOM.
-- Handling NN failover is the king of this headache, our options would be to
either edit log the renames as well (and to make sure we do not exceed the
[xattr size limit|https://issues.apache.org/jira/browse/HDFS-6344]), or start
from the beginning.
- This is purely from re-encryption and may not hold true for SPS. It turned
out the performance bottleneck was to setup connection to the KMS for each
EDEK, and I had to change the API to use be a batched interface, where a batch
of EDEKs is sent over through 1 call. I think it would be beneficial to run
some benchmarks early, so we know if SPS updating xattrs one by one to the NN
would be troublesome. Hopefully not :) (i.e. what's the expected ops/sec to the
NN from SPS?)
Detailed comments:
- Now that we're creating a generic {{FSTreeTraverser}} (nicely done!), feel
free to update the javadoc / class structures as you see fit. No objections. :)
- Suggest to add a link in the class javadoc, to the traversal order javadoc on
{{traverseDir}}. We should also speak out this is relying on the
{{FSDirectory}}'s readlock, and how the throttling is done in general - only
after each whole batch is done
- in {{ReencryptionHandler#processFileInode}} we had to type case
{{traverseInfo}} to {{ZoneTraverseInfo}} for the {{equals}} comparison. An
alternative would be to add a method on the base {{TraverseInfo}} class, and
override it. (e.g. {{shouldAddINodeToBatch(Object)}} - I'm not good at naming
though). I see {{FileInodeIdCollector}} also does similar comparison there,
though not requiring a type case.
- As Uma pointed out, there are a few places where 're-encryption' or 'zone' is
mentioned in the {{FSTreeTraverser}} class
- Can we change the name of 'rootId', since first reaction to root in NN is the
[rootINode|https://github.com/apache/hadoop/blob/branch-3.0.0-alpha4/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeId.java#L36]
aka {{/}}. Perhaps 'baseId', or 'startId'?
was (Author: xiaochen):
Thanks [~surendrasingh] for the work. When writing re-encryption code I didn't
think it would be reused, so glad to see the abstraction and reuse here!
High level comments:
- The most headache for re-encryption was renames. That's why HDFS-10899 is
designed to be more like a maintenance feature and disable renames when the
zone is under re-encryption. (rename is separated out to a subjira, and not
done currently) The difficulty here is to guarantee no inodes are lost in the
iteration. Specifically this:
-# We're iterating A-Z.
-# We hit the threshold at M, hence release the lock, process the batched work,
then reacquire the lock
-# During the above, Y is renamed to B2.
-# We resume from children list (N), not knowing Y is renamed.
Some thoughts about rename (though a while ago) was:
-- We can track the renames separately, and process them in the next iteration.
Also need to make sure they don't mess up with progress tracking.
-- If there are too many renames, we can abort the operation to prevent OOM.
-- Handling NN failover is the king of this headache, our options would be to
either edit log the renames as well (and to make sure we do not exceed the
[xattr size limit|https://issues.apache.org/jira/browse/HDFS-6344]), or start
from the beginning.
> [SPS]: Provide a mechanism to recursively iterate and satisfy storage policy
> of all the files under the given dir
> -----------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-12291
> URL: https://issues.apache.org/jira/browse/HDFS-12291
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode, namenode
> Reporter: Rakesh R
> Assignee: Surendra Singh Lilhore
> Attachments: HDFS-12291-HDFS-10285-01.patch,
> HDFS-12291-HDFS-10285-02.patch
>
>
> For the given source path directory, presently SPS consider only the files
> immediately under the directory(only one level of scanning) for satisfying
> the policy. It WON’T do recursive directory scanning and then schedules SPS
> tasks to satisfy the storage policy of all the files till the leaf node.
> The idea of this jira is to discuss & implement an efficient recursive
> directory iteration mechanism and satisfies storage policy for all the files
> under the given directory.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]