[
https://issues.apache.org/jira/browse/HDDS-10908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wei-Chiu Chuang updated HDDS-10908:
-----------------------------------
Description:
The current configuration has the XceiverServerGrpc boss and worker event loop
group share the same thread pool whose size is number of volumes *
hdds.datanode.read.chunk.threads.per.volume / 10, and executor thread pool size
number of volumes * hdds.datanode.read.chunk.threads.per.volume.
The event loop group thread pool size is too small. Assuming single volume that
implies just one thread shared between boss/worker.
Using freon DN Echo tool I found increasing the pool size slightly
significantly increases throughput:
{noformat}
sudo -u hdfs ozone freon dne --clients=32 --container-id=1001 -t 32 -n 10000000
--sleep-time-ms=0 --read-only
hdds.datanode.read.chunk.threads.per.volume = 10 (default):
mean rate = 44125.45 calls/second
hdds.datanode.read.chunk.threads.per.volume = 20:
mean rate = 61322.60 calls/second
hdds.datanode.read.chunk.threads.per.volume = 40:
mean rate = 77951.91 calls/second
hdds.datanode.read.chunk.threads.per.volume = 100:
mean rate = 65573.07 calls/second
hdds.datanode.read.chunk.threads.per.volume = 1000:
mean rate = 25079.32 calls/second
{noformat}
So it appears that increasing the default value to 40 has positive impact. Or
we should consider don't associate the thread pool size with number of volumes.
Otherwise the number becomes too big for say 48 disks.
Note:
DN echo in Ratis read only mode is about 83k requests per second on the same
host.
OM echo in read only mode is about 38k requests per second.
was:
The current configuration has the XceiverServerGrpc boss and worker event loop
group share the same thread pool whose size is number of volumes *
hdds.datanode.read.chunk.threads.per.volume / 10, and executor thread pool size
number of volumes * hdds.datanode.read.chunk.threads.per.volume.
The event loop group thread pool size is too small. Assuming single volume that
implies just one thread shared between boss/worker.
Using freon DN Echo tool I found increasing the pool size slightly
significantly increases throughput:
{noformat}
sudo -u hdfs ozone freon dne --clients=32 --container-id=1001 -t 32 -n 10000000
--sleep-time-ms=0 --read-only
hdds.datanode.read.chunk.threads.per.volume = 10 (default):
mean rate = 44125.45 calls/second
hdds.datanode.read.chunk.threads.per.volume = 20:
mean rate = 61322.60 calls/second
hdds.datanode.read.chunk.threads.per.volume = 40:
mean rate = 77951.91 calls/second
hdds.datanode.read.chunk.threads.per.volume = 100:
mean rate = 65573.07 calls/second
hdds.datanode.read.chunk.threads.per.volume = 1000:
mean rate = 25079.32 calls/second
{noformat}
So it appears that increasing the default value to 40 has positive impact. Or
we should consider don't associate the thread pool size with number of volumes.
Note:
DN echo in Ratis read only mode is about 83k requests per second on the same
host.
OM echo in read only mode is about 38k requests per second.
> Increase DataNode XceiverServerGrpc event loop group size
> ---------------------------------------------------------
>
> Key: HDDS-10908
> URL: https://issues.apache.org/jira/browse/HDDS-10908
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Ozone Datanode
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
> Priority: Major
>
> The current configuration has the XceiverServerGrpc boss and worker event
> loop group share the same thread pool whose size is number of volumes *
> hdds.datanode.read.chunk.threads.per.volume / 10, and executor thread pool
> size number of volumes * hdds.datanode.read.chunk.threads.per.volume.
> The event loop group thread pool size is too small. Assuming single volume
> that implies just one thread shared between boss/worker.
> Using freon DN Echo tool I found increasing the pool size slightly
> significantly increases throughput:
> {noformat}
> sudo -u hdfs ozone freon dne --clients=32 --container-id=1001 -t 32 -n
> 10000000 --sleep-time-ms=0 --read-only
> hdds.datanode.read.chunk.threads.per.volume = 10 (default):
> mean rate = 44125.45 calls/second
> hdds.datanode.read.chunk.threads.per.volume = 20:
> mean rate = 61322.60 calls/second
> hdds.datanode.read.chunk.threads.per.volume = 40:
> mean rate = 77951.91 calls/second
> hdds.datanode.read.chunk.threads.per.volume = 100:
> mean rate = 65573.07 calls/second
> hdds.datanode.read.chunk.threads.per.volume = 1000:
> mean rate = 25079.32 calls/second
> {noformat}
> So it appears that increasing the default value to 40 has positive impact. Or
> we should consider don't associate the thread pool size with number of
> volumes. Otherwise the number becomes too big for say 48 disks.
> Note:
> DN echo in Ratis read only mode is about 83k requests per second on the same
> host.
> OM echo in read only mode is about 38k requests per second.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]