[ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16273322#comment-16273322
 ] 

Misha Dmitriev commented on HDFS-12051:
---------------------------------------

Thank you for the review [~yzhangal] Here are the answers to your questions:

{quote}1. The original NameCache works like this, when loading fsimage, it put 
names into a transient cache and remember the counts of each name, if the count 
of a name reachs a threshold (configurable with default 10), it promote the 
name to the permanent cache. After fsimage is loaded, it will clean up the 
transient cache and freeze the final cache. The problem described here is about 
calculating snapshotdiff which happens after fsimage loading. Thus any new 
name, even if it appears many times, would not benefit from the NameCache. 
Let's call this solution1, your change is to always allow the cache to be 
updated, and let's call it solution2.{quote}

I would say that this solution may (or does) have problems even during fsimage 
loading. What if fsimage contains a very high number of different names? Then 
NameCache may grow to several million entries, and a java.util.HashMap of this 
size is very costly to operate, because, for one thing, a separate 
HashMap$Entry object is created for each key-value pair. This would become a 
big burden on the GC and a big overhead in terms of CPU cycles.

{quote}2. If we modify solution1 to keep updating the cache instead of freezing 
it, we have chance to help the problem to solve here, however, depending on the 
threshold, the number of entries in the final cache of solution1 can be very 
different, thus memory footprint can be very different.{quote}

See above, and also see my initial explanation in this ticket. If we allow the 
cache in solution1 to grow, its memory footprint may grow comparable to the 
memory savings that it achieves, while creating a lot of additional GC pressure 
as explained above.

{quote}3. The cache size to be configured in solution2 would impact the final 
memory footprint too. If it's configured too small, we might end up many 
duplicates too. So having a reasonable default configuration would be 
important. It's so internal that we may not easily make good recommendation to 
users when to adjust it.{quote}

Yes. Ideally, more measurements need to be done and a better algorithm for 
selecting its size should be devised. But let's do it incrementally. Right now, 
this cache consumes very little extra memory yet saves quite a lot of it. This 
is much better than what we had before.

{quote}4. How much memory we are saving when saying "8.5% reduction"?{quote}

In this particular case, see above for the two lines explaining the overhead of 
duplicate byte[] arrays from jxray memory reports before and after the change:

Types of duplicate objects:
     Ovhd         Num objs  Num unique objs   Class name

Before:
346,198K (12.6%)   12097893      3714559         byte[]
After
100,440K (3.9%)   6208877      3855398         byte[]

So, the overhead went down from ~333MB to ~100MB in this synthetic workload 
example. Note that in the original, production heap dump that I started with, 
the heap size and the overhead of duplicate byte[] arrays is much higher ~3GB:

3,220,272K (6.5%)   104749528      25760871         byte[]

{quote}
6. Solution2 might benefit some cases, but make other cases worse. If we decide 
to proceed, wonder if we can make both solution1 and solution2 available, and 
make it switchable when needed.
{quote}

For the use cases that I considered, it was a clear net benefit. I've also 
explained the very real problems with solution 1 in the answer to your question 
(1): it may create a lot of extra memory pressure, especially when the data is 
unscewed (i.e. the size of the resulting cache is comparable to the total 
number of names). And if a limit is put on the size of name cache in solution 
1, it will have the same drawback as solution 2, while still requiring more 
memory.

{quote}7. Suggest to add more comments in code. For example. for (int 
colsnChainLen = 0; colsnChainLen < 5; colsnChainLen++) {, what this does, and 
why "5".{quote}

Certainly, will do.

> Intern INOdeFileAttributes$SnapshotCopy.name byte[] arrays to save memory
> -------------------------------------------------------------------------
>
>                 Key: HDFS-12051
>                 URL: https://issues.apache.org/jira/browse/HDFS-12051
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Misha Dmitriev
>            Assignee: Misha Dmitriev
>         Attachments: HDFS-12051.01.patch, HDFS-12051.02.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>      Ovhd         Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528      25760871         byte[]
> ....
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ... and 45902395 more arrays, of which 13158084 are unique
>      <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name 
> <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode 
> <--  {j.u.ArrayList} <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs 
> <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 
> elements) ... <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
>   409,830K (0.8%), 13482787 dup arrays (13260241 unique)
> 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...)
> ... and 13479257 more arrays, of which 13260231 are unique
>      <-- org.apache.hadoop.hdfs.server.namenode.INodeFile.name <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- 
> org.apache.hadoop.util.LightWeightGSet$LinkedElement[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
> ....
> {code}
> To eliminate this duplication and reclaim memory, we will need to write a 
> small class similar to StringInterner, but designed specifically for byte[] 
> arrays.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to