Stephen O'Donnell created HDDS-4427:
---------------------------------------

             Summary: Avoid ContainerCache in ContainerReader at Datanode 
startup
                 Key: HDDS-4427
                 URL: https://issues.apache.org/jira/browse/HDDS-4427
             Project: Hadoop Distributed Data Store
          Issue Type: Improvement
          Components: Ozone Datanode
    Affects Versions: 1.1.0
            Reporter: Stephen O'Donnell
            Assignee: Stephen O'Donnell


Testing on a dense datanode (200k containers, 45 disks) I see contention around 
the ContainerCache. Most of the time most threads are running in parallel, but 
we see slowdowns where most threads get blocked waiting on the ContainerCache 
lock.

Examining JStacks, we can see the runnable thread blocking others is typically 
evicting a RocksDB instance from the cache:

{code}
"Thread-37" #131 prio=5 os_prio=0 tid=0x00007f8f49219800 nid=0x1c5e9 runnable 
[0x00007f86f7e78000]
   java.lang.Thread.State: RUNNABLE
        at org.rocksdb.RocksDB.closeDatabase(Native Method)
        at org.rocksdb.RocksDB.close(RocksDB.java:468)
        at 
org.apache.hadoop.hdds.utils.RocksDBStore.close(RocksDBStore.java:389)
        at 
org.apache.hadoop.ozone.container.common.utils.ReferenceCountedDB.cleanup(ReferenceCountedDB.java:79)
        at 
org.apache.hadoop.ozone.container.common.utils.ContainerCache.removeLRU(ContainerCache.java:106)
        at org.apache.commons.collections.map.LRUMap.addMapping(LRUMap.java:242)
        at 
org.apache.commons.collections.map.AbstractHashedMap.put(AbstractHashedMap.java:284)
        at 
org.apache.hadoop.ozone.container.common.utils.ContainerCache.getDB(ContainerCache.java:167)
        at 
org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils.getDB(BlockUtils.java:63)
        at 
org.apache.hadoop.ozone.container.keyvalue.helpers.KeyValueContainerUtil.parseKVContainerData(KeyValueContainerUtil.java:165)
        at 
org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.verifyAndFixupContainerData(ContainerReader.java:183)
        at 
org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.verifyContainerFile(ContainerReader.java:160)
        at 
org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.readVolume(ContainerReader.java:137)
        at 
org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.run(ContainerReader.java:91)
        at java.lang.Thread.run(Thread.java:748)
{code}

The slowness seems to be driven by the RocksDB close call. It is generally 
fast, but is often around 1ms. Eg, here are some timings from that call after 
adding instrumentation to the code:

{code}
grep -a "metric: closing DB took" ozone-datanode.log | cut -d ":" -f 6 | sort 
-n | uniq -c
61940 0
128155 1
2786 2
236 3
53 4
42 5
17 6
10 7
8 8
15 9
{code}

The timer was only at ms precision, so that is why many are zero. Even at 1ms 
per close, we can only close 1000 per second and this point of the code is 
serialized.

At startup time, there is no value in caching the open containers. All 
containers on the node need to be read in parallel, therefore we should simply 
open and close each container without caching the instance.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to