[ 
https://issues.apache.org/jira/browse/HADOOP-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129144#comment-16129144
 ] 

Xiao Chen commented on HADOOP-14688:
------------------------------------

Bumping up on this...

I understand the concern of the added overhead for a normal operation like 
getFileEncryptionInfo. But from internal runs I did not see this interning 
causing any visible impact on NN throughput.

On the other hand, heap is pretty ugly without this one during re-encryption. 
Attaching a report ran from [jxray|http://www.jxray.com/]. The most related 
section is:
{quote}
7. DUPLICATE STRINGS

Total strings: 2,570,432  Unique strings: 1,033,993  Duplicate values: 3,559  
Overhead: 170,572K (8.4%)

Top duplicate strings:
    Ovhd         Num char[]s   Num objs   Value

103,775K (5.1%)   830205      830205      
"mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf"
 23,042K (1.1%)   184337      184337      
"0OFmjElLqXgtjvWKkgfRoLpUj92dHrEaQCPeh3VDh8V"
  8,668K (0.4%)   184937      184937      "EEK"
  2,853K (0.1%)    12176       12176      "POST 
/kms/v1/keyversion/mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf/_eek?eek_op=reencrypt
 HTTP/1.1"
  2,473K (0.1%)    12177       12177      
"/kms/v1/keyversion/mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf/_eek?eek_op=reencrypt"
  2,298K (0.1%)    13374       13374      
"/kms/v1/keyversion/mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf/_eek"

{quote}

> Intern strings in KeyVersion and EncryptedKeyVersion
> ----------------------------------------------------
>
>                 Key: HADOOP-14688
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14688
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: kms
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>         Attachments: GC root of the String.png, HADOOP-14688.01.patch, 
> heapdump analysis.png, jxray.report
>
>
> This is inspired by [~mi...@cloudera.com]'s work on HDFS-11383.
> The key names and key version names are usually the same for a bunch of 
> {{KeyVersion}} and {{EncryptedKeyVersion}}. We should not create duplicate 
> objects for them.
> This is more important to HDFS-10899, where we try to re-encrypt all files' 
> EDEKs in a given EZ. Those EDEKs all has the same key name, and mostly using 
> no more than a couple of key version names.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to