[
https://issues.apache.org/jira/browse/HDFS-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870903#comment-15870903
]
Andrew Wang commented on HDFS-10899:
------------------------------------
Hi Xiao, this latest rev looks like a big improvement in readability, and
thanks for the speedy work getting through all the previous comments!
Nit:
* FSEditLog, rename logCompleteReencryptZone to logReencryptZoneComplete to
match name of the op and the Status method?
ReencryptionHandler:
* processCurrentBatch, checkReadyForWrite can throw an exception, and it's not
guarded by the try/finally for the writeLock.
* processCurrentBatch, it looks like we ride over a GeneralSecurityException or
null return value when talking to the KMS. Shouldn't this either throw, or be
retried? We don't want to miss reencrypting an EDEKs.
* High-level, it'd be good to think about the behavior when the KMS is
slow/flaky/down. It looks like right now an IOException will throw all the way
up to {{run}} which has a fixed sleep, but we might want some backoff for
connectivity issues.
* In reencryptEncryptionZone, I think restoreFromLastProcessedFile also needs
the read lock held when calling? Sprinkling additional paranoia {{hasReadLock}}
/ {{hasWriteLock}} checks everywhere would be appreciated.
* {{restoreFromLPF}}, javadoc looks out of date since there's no longer a
{{subdirs}}
* The new throttling mechanism could use some code comments
* Would still prefer this if statement to be in the constructor, have it throw
an exception all the way up to the FSDirectory constructor:
{code}
if (ezManager.getProvider() == null) {
LOG.info("No provider set, cannot re-encrypt");
return;
}
{code}
* Similarly, we should do some bounds checking on all the config keys in the
constructor (like > 0). I think {{isEnabled}} becomes extraneous afterwards. I
don't think enable/disable is necessary since reencryption is already "opt-in"
in that an admin has to trigger it, but if we really do want a flag, I'd prefer
we do it with a new config key rather than with the throttling configs.
* I still find the call hierarchy complicated to reason about, since it has two
entry points and is bi-recursive (run -> reencryptEZ -> restoreFromLPF or
reencryptDir -> reencryptDirInt <-> reencryptINode). Rather than recursion,
have you considered using a stack or queue to hold unprocessed items? I think
this gets us towards a single reencryptDir call in reencryptEZ, since
restoreFromLPF would be responsible for repopulating the stack/queue.
* reencryptDirInt, I see there's a fixup for {{i}} if things have changed, but
I don't think this is sufficient. Imagine if the dir's children have been
deleted. {{INodeDirectory#nextChild}} can return a negative index, which will
always be less than the for loop termination condition {{children.size()}}.
{{children.get(i)}} won't behave correctly with a negative index.
* reencryptINode, needs documentation on the return value
* If the # of items is an exact multiple of {{reencryptBatchSize}}, will the
final call to {{processCurrentBatch}} with {{lastBatch = true}} in
{{reencryptEncryptionZone}} correctly log the completion? It returns early if
{{currentBatch}} is empty. IMO we should separate batch processing from logging
the completion, there doesn't seem like much code sharing.
* Could use some class-level documentation about the iteration order (appears
to be depth-first) and how we handle deletes and creates.
* Add a Precondition check in ReencryptTracker that passed inode is not null?
This is important so the {{Objects.equals}} in processCurrentBatch behaves
correctly.
EZManager:
* Rename {{getDir}} to {{getFSDirectory}} for clarity?
Follow-on:
* This relates to the status follow-on you're planning. I noticed cancelling a
reencrypt reuses the complete edit log op. It'd be nice to store some
additional info in the EZ root xattr admins can differentiate between
completion and cancellation.
> Add functionality to re-encrypt EDEKs.
> --------------------------------------
>
> Key: HDFS-10899
> URL: https://issues.apache.org/jira/browse/HDFS-10899
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: encryption, kms
> Reporter: Xiao Chen
> Assignee: Xiao Chen
> Attachments: editsStored, HDFS-10899.01.patch, HDFS-10899.02.patch,
> HDFS-10899.03.patch, HDFS-10899.04.patch, HDFS-10899.05.patch,
> HDFS-10899.06.patch, HDFS-10899.07.patch, HDFS-10899.08.patch,
> HDFS-10899.09.patch, HDFS-10899.wip.2.patch, HDFS-10899.wip.patch, Re-encrypt
> edek design doc.pdf
>
>
> Currently when an encryption zone (EZ) key is rotated, it only takes effect
> on new EDEKs. We should provide a way to re-encrypt EDEKs after the EZ key
> rotation, for improved security.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]