Wei-Chiu Chuang created HDDS-4970:
-------------------------------------
Summary: Significant overhead when DataNode is over-scribed
Key: HDDS-4970
URL: https://issues.apache.org/jira/browse/HDDS-4970
Project: Apache Ozone
Issue Type: Bug
Components: Ozone Datanode
Affects Versions: 1.0.0
Reporter: Wei-Chiu Chuang
Attachments: Screen Shot 2021-03-11 at 11.58.23 PM.png
Ran a microbenchmark to have concurrent clients reading chunks from a DataNode.
When the number of clients grows, there is a significant amount of overhead in
accessing a concurrent hash map. The overhead grows exponentially with respect
to the number of clients.
{code:java|title=ChunkUtils#processFileExclusively}
@VisibleForTesting
static <T> T processFileExclusively(Path path, Supplier<T> op) {
for (;;) {
if (LOCKS.add(path)) {
break;
}
}
try {
return op.get();
} finally {
LOCKS.remove(path);
}
}
{code}
In my test, having 64 concurrent clients reading chunks from a 1-disk DataNode
caused the DN to spend nearly half of the time adding into the LOCKS object (a
concurrent hash map).
!Screen Shot 2021-03-11 at 11.58.23 PM.png|width=640!
Given that it is not uncommon to find HDFS DataNodes with tens of thousands of
incoming client connections, I expect to see similar traffic to an Ozone
DataNode at scale.
We should fix this code.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]