[
https://issues.apache.org/jira/browse/HDFS-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634823#comment-15634823
]
Andrew Wang commented on HDFS-10899:
------------------------------------
Thanks for working on this Xiao! I didn't give the whole patch a deep look, but
some review comments to start it off. I didn't get to the KMS changes, it'd
help to split those into a separate patch to make review easier.
Comments as follows:
* getShortUsage, are -path / -cancel / -verify exclusive operations? We should
indicate this in the usage text if so. {{man git}} for instance looks like:
{noformat}
usage: git [--version] [--help] [-C <path>] [-c name=value]
[--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
[-p | --paginate | --no-pager] [--no-replace-objects] [--bare]
[--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
<command> [<args>]
{noformat}
So {{<command> <path>}} would work here, with the long usage explaining the
commands.
* We have new functionality to specify time with units in configuration, want
to use it here?
* New keys should be documented in hdfs-default.xml also
EZManager:
* Some unused imports in I'm sure checkstyle will complain about
* Typo: "filesReencryptedInCurentZone" -> "Current"
* Can the new methods in EZManager be encapsulated as a class? Could use some
javadoc about expected behavior also.
* Empty {{finally}} case in the run method, can be deleted?
* Thread should check if the NN is active or standby, typically we do this in
FSNamesystem#startActiveServices and FSNamesystem#stopActiveServices
* reencryptDir looks like it takes the writeLock the entire time while it's
processing the files in a directory, including when talking to the KMS. We
should not hold the lock while talking to the KMS.
* reencryptDir is also a big method, refactor the innards of the for loop?
* We also should do a logSync before we log the "Completed" message in
{{reencryptEncryptionZoneInt}}, so we block first. Maybe elsewhere too.
* Calling {{getListing}} will generate audit logs, we should be calling
{{getListingInt}} instead. Also, we don't need a full {{HdfsFileStatus}} to
reencrypt each file, so let's consider an INode-based traversal.
* Structuring this as an iterator (and maybe also a visitor) would make the
code more clear.
* Recommend we rename {{updateReencryptStatus}} so it's clear that this is used
during edit log / fsimage loading, or at least add some javadoc.
ReencryptInfo:
* Recommend we name ReencryptInfo something like "PendingReencryptionZones" or
something to be more self documenting. The javadoc also could mention that this
is basically a queue.
* A small warning that {{hasZone}} will be {{O(n)}} if the JDK implementation
is truly a queue, maybe we should use a LinkedHashMap instead? I don't think we
use the deque nature.
* How is the reencryptInfo in EZManager synchronized? I ask because it's a
ConcurrentLinkedDeque, is there actually concurrency happening?
> Add functionality to re-encrypt EDEKs.
> --------------------------------------
>
> Key: HDFS-10899
> URL: https://issues.apache.org/jira/browse/HDFS-10899
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: encryption, kms
> Reporter: Xiao Chen
> Assignee: Xiao Chen
> Attachments: HDFS-10899.wip.patch, Re-encrypt edek design doc.pdf
>
>
> Currently when an encryption zone (EZ) key is rotated, it only takes effect
> on new EDEKs. We should provide a way to re-encrypt EDEKs after the EZ key
> rotation, for improved security.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]