[
https://issues.apache.org/jira/browse/HADOOP-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129144#comment-16129144
]
Xiao Chen commented on HADOOP-14688:
------------------------------------
Bumping up on this...
I understand the concern of the added overhead for a normal operation like
getFileEncryptionInfo. But from internal runs I did not see this interning
causing any visible impact on NN throughput.
On the other hand, heap is pretty ugly without this one during re-encryption.
Attaching a report ran from [jxray|http://www.jxray.com/]. The most related
section is:
{quote}
7. DUPLICATE STRINGS
Total strings: 2,570,432 Unique strings: 1,033,993 Duplicate values: 3,559
Overhead: 170,572K (8.4%)
Top duplicate strings:
Ovhd Num char[]s Num objs Value
103,775K (5.1%) 830205 830205
"mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf"
23,042K (1.1%) 184337 184337
"0OFmjElLqXgtjvWKkgfRoLpUj92dHrEaQCPeh3VDh8V"
8,668K (0.4%) 184937 184937 "EEK"
2,853K (0.1%) 12176 12176 "POST
/kms/v1/keyversion/mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf/_eek?eek_op=reencrypt
HTTP/1.1"
2,473K (0.1%) 12177 12177
"/kms/v1/keyversion/mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf/_eek?eek_op=reencrypt"
2,298K (0.1%) 13374 13374
"/kms/v1/keyversion/mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf/_eek"
{quote}
> Intern strings in KeyVersion and EncryptedKeyVersion
> ----------------------------------------------------
>
> Key: HADOOP-14688
> URL: https://issues.apache.org/jira/browse/HADOOP-14688
> Project: Hadoop Common
> Issue Type: Improvement
> Components: kms
> Reporter: Xiao Chen
> Assignee: Xiao Chen
> Attachments: GC root of the String.png, HADOOP-14688.01.patch,
> heapdump analysis.png, jxray.report
>
>
> This is inspired by [[email protected]]'s work on HDFS-11383.
> The key names and key version names are usually the same for a bunch of
> {{KeyVersion}} and {{EncryptedKeyVersion}}. We should not create duplicate
> objects for them.
> This is more important to HDFS-10899, where we try to re-encrypt all files'
> EDEKs in a given EZ. Those EDEKs all has the same key name, and mostly using
> no more than a couple of key version names.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]