[
https://issues.apache.org/jira/browse/HDFS-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16134793#comment-16134793
]
Wei-Chiu Chuang commented on HDFS-10899:
----------------------------------------
Hi Xiao, thanks for the latest work!
I reviewed the latest patch (rev 014), mostly focused on ReencryptionHandler.
# ReencryptionHandler#(constructor)
{code:title=ReencryptionHandler#(constructor)}
if (reencryptBatchSize > 2000) {
// 2000 is based on buffer size = 512 *
1024, and SetXAttr op size ~ 200.
LOG.warn("Re-encryption batch size is {}. It could cause edit log buffer "
+ "to be full an trigger a logSync within the writelock, greatly "
+ "impacting namenode throughput.");
}
{code}
* The log needs to print reencryptBatchSize.
* It might also improve readability by making the hard coded number (2000) a
final static member variable.
* Please also add few more lines of comment to add the reference to
QuorumJournalManager#outputBufferCapacity.
# ReencryptionHandler#batchService, and ReencryptionUpdater#batchService
can be declared as ExecutorCompletionService<ReencryptionTask>
# ReencryptionHandler#removeZone(): It looks like you can de-dup some code by
reusing ReencryptionHandler#cancelZone().
# reencryptionHandler#reencryptEncryptionZone()
{code:title=reencryptionHandler#reencryptEncryptionZone()}
zs = getReencryptionStatus().getZoneStatus(zoneId);
assert zs != null;
{code}
* zoneId is obtained when holding FSDirectory read lock, release the lock, and
then acquire FSDirectory read lock again.
* This assertion is only correct if there will be only one ReencryptionHandler
running.
# ReencryptionStatus#updateZoneStatus() should check that zoneNode is an
encryption zone.
# Why is currentBatch a TreeMap?
# Does ZoneReencryptionStatus#getLastProcessedFile return the relative path? or
file name only? or absolute path?
# ReencryptionHandler#checkZoneReady
{code:title=ReencryptionHandler#checkZoneReady}
dir.getFSNamesystem()
.checkNameNodeSafeMode("NN is in safe mode, cannot " + "re-encrypt");
{code}
* The “+” is redundant.
# ReencryptionHandler#reencryptDir
{code:title=ReencryptionHandler#reencryptDir}
curr = parent;
curr =
reencryptDirInt(zoneId, currentBatch, curr, startAfters, ezKeyVerName);
{code}
Better rewrite as
{code}
curr =
reencryptDirInt(zoneId, currentBatch, parent, startAfters, ezKeyVerName);
{code}
# ReencryptionHandler#reencryptDirInt
{code:title=ReencryptionHandler#reencryptDirInt}
assert dir.hasReadLock();
Preconditions.checkNotNull(curr, "Current inode can't be null");
assert dir.hasReadLock();
{code}
* The second assertion is redundant.
# ReencryptionHandler#reencryptINode
{code:title=ReencryptionHandler#reencryptINode}
FileEncryptionInfo feInfo = FSDirEncryptionZoneOp
.getFileEncryptionInfo(dir, INodesInPath.fromINode(inode));
if (feInfo == null || ezKeyVerName.equals(feInfo.getEzKeyVersionName())) {
LOG.debug("File {} skipped re-encryption because edek's key version name"
+ " is not changed.", name);
return false;
}
{code}
* if feInfo is null, it would be unexpected — an INodeFile under an encryption
zone is supposed to have FileEncryptionInfo.
# ReencryptionHandler#submitCurrentBatch
* It Allocates a 2000-element map, copy it over, and then clear the map. That
looks suboptimal. Would it be feasible to wrap TreeMap and make a method that
simply assigns the TreeMap reference to another currentBatch?
# ReencryptionHandler.EDEKReencryptCallable:
{code}
// TODO: add reasonable retries here.
{code}
* I feel the retry should be handled by provider.
* if reencryptEdeks() returns numFailures > 0, call() should not return a new
ReencryptionTask object.
> Add functionality to re-encrypt EDEKs
> -------------------------------------
>
> Key: HDFS-10899
> URL: https://issues.apache.org/jira/browse/HDFS-10899
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: encryption, kms
> Reporter: Xiao Chen
> Assignee: Xiao Chen
> Attachments: editsStored, HDFS-10899.01.patch, HDFS-10899.02.patch,
> HDFS-10899.03.patch, HDFS-10899.04.patch, HDFS-10899.05.patch,
> HDFS-10899.06.patch, HDFS-10899.07.patch, HDFS-10899.08.patch,
> HDFS-10899.09.patch, HDFS-10899.10.patch, HDFS-10899.10.wip.patch,
> HDFS-10899.11.patch, HDFS-10899.12.patch, HDFS-10899.13.patch,
> HDFS-10899.14.patch, HDFS-10899.wip.2.patch, HDFS-10899.wip.patch, Re-encrypt
> edek design doc.pdf, Re-encrypt edek design doc V2.pdf
>
>
> Currently when an encryption zone (EZ) key is rotated, it only takes effect
> on new EDEKs. We should provide a way to re-encrypt EDEKs after the EZ key
> rotation, for improved security.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]