[jira] [Commented] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

Misha Dmitriev (JIRA) Fri, 09 Feb 2018 12:48:25 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358939#comment-16358939
 ]


Misha Dmitriev commented on HDFS-12051:
---------------------------------------

Thank you for the review [~atm] Please see my answers below.

_> Is there a way to disable the cache entirely, if we find that there's some 
bug in the implementation? e.g. if you set the ratio to 0, does everything 
behave correctly?_

It won't, but I can add this functionality.

_> How hard would it be to not make this class a static singleton, and instead 
have a single instance of it in the NN that can be referenced, perhaps as an 
instance variable of the {{FSNamesystem}}? That seems a bit less fragile if 
it's possible, and could allow for the class to be more easily tested._

As you can see, this class is not really a static singleton. Its public API is 
indeed a single static put() method, but inside there is a singleton _instance_ 
of NameCache, with its instance methods. Initially I didn't have this singleton 
at all, and it indeed was an instance variable of FSNamesystem. But later I 
found that there are several other places in the code where duplicate byte[] 
arrays are generated, and where it would be very hard to pass this instance 
variable. So I ended up with this static API, which makes it easier to use 
NameCache anywhere in the code. But ability to test it is not compromised.

_> Have you done any verification of the correctness of this cache in any of 
your benchmarks? e.g. something that walked the file system tree to ensure that 
the names are identical with/without this cache I think would help allay 
correctness concerns._

Well, I can try that, but honestly, how paranoid should we be? In my opinion, 
this code is simple enough to pass with a combination of unit tests and some 
runs in the cluster.

_> I'd really like to see some more tests of the actual cache implementation 
itself, e.g. in the presence of hash collisions, behavior at the boundaries of 
the main cache array, overlap of slots probed in the open addressing search, 
other edge cases, etc._

_>I see that precommit raised some findbugs warnings and had some failed unit 
tests. Can we please address the findbugs warnings, and also confirm that those 
unit test failures are unrelated?_

The single findbugs issue has been already explained. It's legitimate, but we 
intentionally do something that wouldn't be good in general (use a volatile 
field and increment it without synchronization) just to enable some information 
for testing without degrading performance in production. As for unit tests - 
well, every time some different unit test fails, which makes me think that they 
are flaky (I had same experience in the past with my other changes in HDFS). I 
looked at them but couldn't see any obvious signs that the problems are related 
to my code. There are timeouts and similar things that tend to happen in flaky 
tests. Here I think I really need help from someone else in the HDFS team.

_> Seems like this cache will have a somewhat odd behavior if an item hashes to 
a slot that's within {{MAX_COLLISION_CHAIN_LEN}} slots of the end of the array, 
in that it looks like we'll just probe the same slot over and over again up to 
{{MAX_COLLISION_CHAIN_LEN}} times. Is this to be expected?_

I don't think there is any problem here. We use the same formula to get the 
next slot, and it wraps around the array boundary correctly. Take a look at the 
test program below that uses the same formula, and its output:
{code:java}
public static void main(String args[]) {
  int capacity = 4;
  int slot = 0;
  for (int i = 0; i < 8; i++) {
    slot = (slot + 1) & (capacity - 1);             
    System.out.println("slot = " + slot);
  }
}

> java Test
slot = 1
slot = 2
slot = 3
slot = 0
slot = 1
slot = 2
slot = 3
slot = 0{code}
 

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-12051
>                 URL: https://issues.apache.org/jira/browse/HDFS-12051
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Misha Dmitriev
>            Assignee: Misha Dmitriev
>            Priority: Major
>         Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, 
> HDFS-12051.11.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>      Ovhd         Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528      25760871         byte[]
> ....
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ... and 45902395 more arrays, of which 13158084 are unique
>      <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name 
> <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode 
> <--  {j.u.ArrayList} <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs 
> <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 
> elements) ... <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
>   409,830K (0.8%), 13482787 dup arrays (13260241 unique)
> 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...)
> ... and 13479257 more arrays, of which 13260231 are unique
>      <-- org.apache.hadoop.hdfs.server.namenode.INodeFile.name <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- 
> org.apache.hadoop.util.LightWeightGSet$LinkedElement[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
> ....
> {code}
> There are several other places in NameNode code which also produce duplicate 
> {{byte[]}} arrays.
> To eliminate this duplication and reclaim memory, we will need to write a 
> small class similar to StringInterner, but designed specifically for byte[] 
> arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

Reply via email to