[ 
https://issues.apache.org/jira/browse/HDDS-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek resolved HDDS-4427.
-------------------------------
    Fix Version/s: 1.1.0
       Resolution: Fixed

> Avoid ContainerCache in ContainerReader at Datanode startup
> -----------------------------------------------------------
>
>                 Key: HDDS-4427
>                 URL: https://issues.apache.org/jira/browse/HDDS-4427
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: Ozone Datanode
>    Affects Versions: 1.1.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.1.0
>
>
> Testing on a dense datanode (200k containers, 45 disks) I see contention 
> around the ContainerCache. Most of the time most threads are running in 
> parallel, but we see slowdowns where most threads get blocked waiting on the 
> ContainerCache lock.
> Examining JStacks, we can see the runnable thread blocking others is 
> typically evicting a RocksDB instance from the cache:
> {code}
> "Thread-37" #131 prio=5 os_prio=0 tid=0x00007f8f49219800 nid=0x1c5e9 runnable 
> [0x00007f86f7e78000]
>    java.lang.Thread.State: RUNNABLE
>         at org.rocksdb.RocksDB.closeDatabase(Native Method)
>         at org.rocksdb.RocksDB.close(RocksDB.java:468)
>         at 
> org.apache.hadoop.hdds.utils.RocksDBStore.close(RocksDBStore.java:389)
>         at 
> org.apache.hadoop.ozone.container.common.utils.ReferenceCountedDB.cleanup(ReferenceCountedDB.java:79)
>         at 
> org.apache.hadoop.ozone.container.common.utils.ContainerCache.removeLRU(ContainerCache.java:106)
>         at 
> org.apache.commons.collections.map.LRUMap.addMapping(LRUMap.java:242)
>         at 
> org.apache.commons.collections.map.AbstractHashedMap.put(AbstractHashedMap.java:284)
>         at 
> org.apache.hadoop.ozone.container.common.utils.ContainerCache.getDB(ContainerCache.java:167)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils.getDB(BlockUtils.java:63)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.KeyValueContainerUtil.parseKVContainerData(KeyValueContainerUtil.java:165)
>         at 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.verifyAndFixupContainerData(ContainerReader.java:183)
>         at 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.verifyContainerFile(ContainerReader.java:160)
>         at 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.readVolume(ContainerReader.java:137)
>         at 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.run(ContainerReader.java:91)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> The slowness seems to be driven by the RocksDB close call. It is generally 
> fast, but is often around 1ms. Eg, here are some timings from that call after 
> adding instrumentation to the code:
> {code}
> grep -a "metric: closing DB took" ozone-datanode.log | cut -d ":" -f 6 | sort 
> -n | uniq -c
> 61940 0
> 128155 1
> 2786 2
> 236 3
> 53 4
> 42 5
> 17 6
> 10 7
> 8 8
> 15 9
> {code}
> The timer was only at ms precision, so that is why many are zero. Even at 1ms 
> per close, we can only close 1000 per second and this point of the code is 
> serialized.
> At startup time, there is no value in caching the open containers. All 
> containers on the node need to be read in parallel, therefore we should 
> simply open and close each container without caching the instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to