[ https://issues.apache.org/jira/browse/HADOOP-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129144#comment-16129144 ]
Xiao Chen edited comment on HADOOP-14688 at 8/16/17 5:48 PM: ------------------------------------------------------------- Bumping up on this... I understand the concern of the added overhead for a normal operation like getFileEncryptionInfo. But from internal runs I did not see this interning causing any visible impact on NN throughput. On the other hand, heap is pretty ugly without this one during re-encryption. Attaching a report ran from [jxray|http://www.jxray.com/]. The most related section is: {quote} 7. DUPLICATE STRINGS Total strings: 2,570,432 Unique strings: 1,033,993 Duplicate values: 3,559 Overhead: 170,572K (8.4%) Top duplicate strings: Ovhd Num char[]s Num objs Value 103,775K (5.1%) 830205 830205 "mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf" 23,042K (1.1%) 184337 184337 "0OFmjElLqXgtjvWKkgfRoLpUj92dHrEaQCPeh3VDh8V" 8,668K (0.4%) 184937 184937 "EEK" 2,853K (0.1%) 12176 12176 "POST /kms/v1/keyversion/mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf/_eek?eek_op=reencrypt HTTP/1.1" 2,473K (0.1%) 12177 12177 "/kms/v1/keyversion/mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf/_eek?eek_op=reencrypt" 2,298K (0.1%) 13374 13374 "/kms/v1/keyversion/mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf/_eek" {quote} With the fix in place, the 6.2% goes away. was (Author: xiaochen): Bumping up on this... I understand the concern of the added overhead for a normal operation like getFileEncryptionInfo. But from internal runs I did not see this interning causing any visible impact on NN throughput. On the other hand, heap is pretty ugly without this one during re-encryption. Attaching a report ran from [jxray|http://www.jxray.com/]. The most related section is: {quote} 7. DUPLICATE STRINGS Total strings: 2,570,432 Unique strings: 1,033,993 Duplicate values: 3,559 Overhead: 170,572K (8.4%) Top duplicate strings: Ovhd Num char[]s Num objs Value 103,775K (5.1%) 830205 830205 "mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf" 23,042K (1.1%) 184337 184337 "0OFmjElLqXgtjvWKkgfRoLpUj92dHrEaQCPeh3VDh8V" 8,668K (0.4%) 184937 184937 "EEK" 2,853K (0.1%) 12176 12176 "POST /kms/v1/keyversion/mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf/_eek?eek_op=reencrypt HTTP/1.1" 2,473K (0.1%) 12177 12177 "/kms/v1/keyversion/mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf/_eek?eek_op=reencrypt" 2,298K (0.1%) 13374 13374 "/kms/v1/keyversion/mGscEhbOphwD8GQkGxVfHnV4PVo3lhmpPWurw3vGsLf/_eek" {quote} > Intern strings in KeyVersion and EncryptedKeyVersion > ---------------------------------------------------- > > Key: HADOOP-14688 > URL: https://issues.apache.org/jira/browse/HADOOP-14688 > Project: Hadoop Common > Issue Type: Improvement > Components: kms > Reporter: Xiao Chen > Assignee: Xiao Chen > Attachments: GC root of the String.png, HADOOP-14688.01.patch, > heapdump analysis.png, jxray.report > > > This is inspired by [~mi...@cloudera.com]'s work on HDFS-11383. > The key names and key version names are usually the same for a bunch of > {{KeyVersion}} and {{EncryptedKeyVersion}}. We should not create duplicate > objects for them. > This is more important to HDFS-10899, where we try to re-encrypt all files' > EDEKs in a given EZ. Those EDEKs all has the same key name, and mostly using > no more than a couple of key version names. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org