[
https://issues.apache.org/jira/browse/HDDS-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marton Elek resolved HDDS-4427.
-------------------------------
Fix Version/s: 1.1.0
Resolution: Fixed
> Avoid ContainerCache in ContainerReader at Datanode startup
> -----------------------------------------------------------
>
> Key: HDDS-4427
> URL: https://issues.apache.org/jira/browse/HDDS-4427
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Components: Ozone Datanode
> Affects Versions: 1.1.0
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.1.0
>
>
> Testing on a dense datanode (200k containers, 45 disks) I see contention
> around the ContainerCache. Most of the time most threads are running in
> parallel, but we see slowdowns where most threads get blocked waiting on the
> ContainerCache lock.
> Examining JStacks, we can see the runnable thread blocking others is
> typically evicting a RocksDB instance from the cache:
> {code}
> "Thread-37" #131 prio=5 os_prio=0 tid=0x00007f8f49219800 nid=0x1c5e9 runnable
> [0x00007f86f7e78000]
> java.lang.Thread.State: RUNNABLE
> at org.rocksdb.RocksDB.closeDatabase(Native Method)
> at org.rocksdb.RocksDB.close(RocksDB.java:468)
> at
> org.apache.hadoop.hdds.utils.RocksDBStore.close(RocksDBStore.java:389)
> at
> org.apache.hadoop.ozone.container.common.utils.ReferenceCountedDB.cleanup(ReferenceCountedDB.java:79)
> at
> org.apache.hadoop.ozone.container.common.utils.ContainerCache.removeLRU(ContainerCache.java:106)
> at
> org.apache.commons.collections.map.LRUMap.addMapping(LRUMap.java:242)
> at
> org.apache.commons.collections.map.AbstractHashedMap.put(AbstractHashedMap.java:284)
> at
> org.apache.hadoop.ozone.container.common.utils.ContainerCache.getDB(ContainerCache.java:167)
> at
> org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils.getDB(BlockUtils.java:63)
> at
> org.apache.hadoop.ozone.container.keyvalue.helpers.KeyValueContainerUtil.parseKVContainerData(KeyValueContainerUtil.java:165)
> at
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.verifyAndFixupContainerData(ContainerReader.java:183)
> at
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.verifyContainerFile(ContainerReader.java:160)
> at
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.readVolume(ContainerReader.java:137)
> at
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.run(ContainerReader.java:91)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> The slowness seems to be driven by the RocksDB close call. It is generally
> fast, but is often around 1ms. Eg, here are some timings from that call after
> adding instrumentation to the code:
> {code}
> grep -a "metric: closing DB took" ozone-datanode.log | cut -d ":" -f 6 | sort
> -n | uniq -c
> 61940 0
> 128155 1
> 2786 2
> 236 3
> 53 4
> 42 5
> 17 6
> 10 7
> 8 8
> 15 9
> {code}
> The timer was only at ms precision, so that is why many are zero. Even at 1ms
> per close, we can only close 1000 per second and this point of the code is
> serialized.
> At startup time, there is no value in caching the open containers. All
> containers on the node need to be read in parallel, therefore we should
> simply open and close each container without caching the instance.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]