[
https://issues.apache.org/jira/browse/HADOOP-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103609#comment-16103609
]
Xiao Chen commented on HADOOP-14688:
------------------------------------
The heapdumps are too big to attach here, so I uploaded a screenshot of the
most relevant analysis result out of it.
The 2 most duplicated strings (mG... and 0O...) are the 2 key version names. I
was running re-encryption on a zone with 1M files. 2 different key versions
were among those files in this run.
Verified after interning, this goes away.
[~daryn], do you think this makes sense? Thanks!
> Intern strings in KeyVersion and EncryptedKeyVersion
> ----------------------------------------------------
>
> Key: HADOOP-14688
> URL: https://issues.apache.org/jira/browse/HADOOP-14688
> Project: Hadoop Common
> Issue Type: Improvement
> Components: kms
> Reporter: Xiao Chen
> Assignee: Xiao Chen
> Attachments: HADOOP-14688.01.patch, heapdump analysis.png
>
>
> This is inspired by [[email protected]]'s work on HDFS-11383.
> The key names and key version names are usually the same for a bunch of
> {{KeyVersion}} and {{EncryptedKeyVersion}}. We should not create duplicate
> objects for them.
> This is more important to HDFS-10899, where we try to re-encrypt all files'
> EDEKs in a given EZ. Those EDEKs all has the same key name, and mostly using
> no more than a couple of key version names.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]