[jira] [Updated] (HDDS-10908) Increase DataNode XceiverServerGrpc event loop group size

Wei-Chiu Chuang (Jira) Thu, 23 May 2024 10:57:05 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-10908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wei-Chiu Chuang updated HDDS-10908:
-----------------------------------
    Description: 
The current configuration has the XceiverServerGrpc boss and worker event loop 
group share the same thread pool whose size is number of volumes * 
hdds.datanode.read.chunk.threads.per.volume / 10, and executor thread pool size 
number of volumes * hdds.datanode.read.chunk.threads.per.volume.

The event loop group thread pool size is too small. Assuming single volume that 
implies just one thread shared between boss/worker.

Using freon DN Echo tool I found increasing the pool size slightly 
significantly increases throughput:

{noformat}
sudo -u hdfs ozone freon dne --clients=32 --container-id=1001 -t 32 -n 10000000 
--sleep-time-ms=0 --read-only

hdds.datanode.read.chunk.threads.per.volume = 10 (default):
     mean rate = 44125.45 calls/second

hdds.datanode.read.chunk.threads.per.volume = 20:
         mean rate = 61322.60 calls/second

hdds.datanode.read.chunk.threads.per.volume = 40:
         mean rate = 77951.91 calls/second

hdds.datanode.read.chunk.threads.per.volume = 100:
         mean rate = 65573.07 calls/second

hdds.datanode.read.chunk.threads.per.volume = 1000:
         mean rate = 25079.32 calls/second

{noformat}

So it appears that increasing the default value to 40 has positive impact. Or 
we should consider don't associate the thread pool size with number of volumes. 
Otherwise the number becomes too big for say 48 disks.

Note: 
DN echo in Ratis read only mode is about 83k requests per second on the same 
host.
OM echo in read only mode is about 38k requests per second.

  was:
The current configuration has the XceiverServerGrpc boss and worker event loop 
group share the same thread pool whose size is number of volumes * 
hdds.datanode.read.chunk.threads.per.volume / 10, and executor thread pool size 
number of volumes * hdds.datanode.read.chunk.threads.per.volume.

The event loop group thread pool size is too small. Assuming single volume that 
implies just one thread shared between boss/worker.

Using freon DN Echo tool I found increasing the pool size slightly 
significantly increases throughput:

{noformat}
sudo -u hdfs ozone freon dne --clients=32 --container-id=1001 -t 32 -n 10000000 
--sleep-time-ms=0 --read-only

hdds.datanode.read.chunk.threads.per.volume = 10 (default):
     mean rate = 44125.45 calls/second

hdds.datanode.read.chunk.threads.per.volume = 20:
         mean rate = 61322.60 calls/second

hdds.datanode.read.chunk.threads.per.volume = 40:
         mean rate = 77951.91 calls/second

hdds.datanode.read.chunk.threads.per.volume = 100:
         mean rate = 65573.07 calls/second

hdds.datanode.read.chunk.threads.per.volume = 1000:
         mean rate = 25079.32 calls/second

{noformat}

So it appears that increasing the default value to 40 has positive impact. Or 
we should consider don't associate the thread pool size with number of volumes.

Note: 
DN echo in Ratis read only mode is about 83k requests per second on the same 
host.
OM echo in read only mode is about 38k requests per second.


> Increase DataNode XceiverServerGrpc event loop group size
> ---------------------------------------------------------
>
>                 Key: HDDS-10908
>                 URL: https://issues.apache.org/jira/browse/HDDS-10908
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Datanode
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Major
>
> The current configuration has the XceiverServerGrpc boss and worker event 
> loop group share the same thread pool whose size is number of volumes * 
> hdds.datanode.read.chunk.threads.per.volume / 10, and executor thread pool 
> size number of volumes * hdds.datanode.read.chunk.threads.per.volume.
> The event loop group thread pool size is too small. Assuming single volume 
> that implies just one thread shared between boss/worker.
> Using freon DN Echo tool I found increasing the pool size slightly 
> significantly increases throughput:
> {noformat}
> sudo -u hdfs ozone freon dne --clients=32 --container-id=1001 -t 32 -n 
> 10000000 --sleep-time-ms=0 --read-only
> hdds.datanode.read.chunk.threads.per.volume = 10 (default):
>      mean rate = 44125.45 calls/second
> hdds.datanode.read.chunk.threads.per.volume = 20:
>          mean rate = 61322.60 calls/second
> hdds.datanode.read.chunk.threads.per.volume = 40:
>          mean rate = 77951.91 calls/second
> hdds.datanode.read.chunk.threads.per.volume = 100:
>          mean rate = 65573.07 calls/second
> hdds.datanode.read.chunk.threads.per.volume = 1000:
>          mean rate = 25079.32 calls/second
> {noformat}
> So it appears that increasing the default value to 40 has positive impact. Or 
> we should consider don't associate the thread pool size with number of 
> volumes. Otherwise the number becomes too big for say 48 disks.
> Note: 
> DN echo in Ratis read only mode is about 83k requests per second on the same 
> host.
> OM echo in read only mode is about 38k requests per second.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-10908) Increase DataNode XceiverServerGrpc event loop group size

Reply via email to