[ 
https://issues.apache.org/jira/browse/HDDS-15338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated HDDS-15338:
-------------------------------
    Description: 
Currently, the implementation of hash for FixedThreadPoolWithAffinityExecutor 
(used to group the ContainerReportQueue assigned for container reports) is as 
follow

{code:java}
    // For messages that need to be routed to the same thread need to
    // implement hashCode to match the messages. This should be safe for
    // other messages that implement the native hash.
    int index = message.hashCode() & (workQueues.size() - 1);
    BlockingQueue<Q> queue = workQueues.get(index);
{code}

However, this might cause skew when the queueCount is not a power of two (like 
the when the default is 10)

For queueCount = 10

{code:java}
queueCount - 1 = 9
9 in binary = 1001
{code}

So the routing becomes:

{code:java}
queueIndex = hash & 9
{code}

That means only the bits represented by 1001 are kept. The possible results are 
only:

{code:java}
0  -> 0000
1  -> 0001
8  -> 1000
9  -> 1001
{code}

So with 10 queues, only queues 0, 1, 8, and 9 can be selected. Queues 2, 3, 4, 
5, 6, 7 are effectively unused.

Example

{code:java}
hash 0  -> 0000 & 1001 = 0000 = queue 0
hash 1  -> 0001 & 1001 = 0001 = queue 1
hash 2  -> 0010 & 1001 = 0000 = queue 0
hash 3  -> 0011 & 1001 = 0001 = queue 1
hash 4  -> 0100 & 1001 = 0000 = queue 0
hash 5  -> 0101 & 1001 = 0001 = queue 1
hash 8  -> 1000 & 1001 = 1000 = queue 8
hash 9  -> 1001 & 1001 = 1001 = queue 9
hash 10 -> 1010 & 1001 = 1000 = queue 8
hash 11 -> 1011 & 1001 = 1001 = queue 9
{code}

*So the configured pool says “10 FCR processing queues”, but in practice it 
behaves closer to 4 active queues.*

This is observed in the MAT tool check in HDDS-15330.

If the queue count were 16:

{code:java}
queueCount - 1 = 15
15 in binary = 1111
queueIndex = hash & 15
{code}

Then possible queue indexes are 0..15, so all 16 queues can be used.

Currently, we can set the queue count to power of two (e.g. 16).

However, to fix the skew entirely, we can use modulo instead

For example,
{code:java}
index = Math.floorMod(hash, queueCount);
{code}

Hopefully through this we can ensure FCR is processed quickly and SCM memory 
can be freed.


  was:
Currently, the implementation of hash for FixedThreadPoolWithAffinityExecutor 
(used to group the ContainerReportQueue assigned for container reports) is as 
follow

{code:java}
    // For messages that need to be routed to the same thread need to
    // implement hashCode to match the messages. This should be safe for
    // other messages that implement the native hash.
    int index = message.hashCode() & (workQueues.size() - 1);
    BlockingQueue<Q> queue = workQueues.get(index);
{code}

However, this might cause skew when the queueCount is not a power of two (like 
the when the default is 10)

For queueCount = 10

{code:java}
queueCount - 1 = 9
9 in binary = 1001
{code}

So the routing becomes:

{code:java}
queueIndex = hash & 9
{code}

That means only the bits represented by 1001 are kept. The possible results are 
only:

{code:java}
0  -> 0000
1  -> 0001
8  -> 1000
9  -> 1001
{code}

So with 10 queues, only queues 0, 1, 8, and 9 can be selected. Queues 2, 3, 4, 
5, 6, 7 are effectively unused.

Example

{code:java}
hash 0  -> 0000 & 1001 = 0000 = queue 0
hash 1  -> 0001 & 1001 = 0001 = queue 1
hash 2  -> 0010 & 1001 = 0000 = queue 0
hash 3  -> 0011 & 1001 = 0001 = queue 1
hash 4  -> 0100 & 1001 = 0000 = queue 0
hash 5  -> 0101 & 1001 = 0001 = queue 1
hash 8  -> 1000 & 1001 = 1000 = queue 8
hash 9  -> 1001 & 1001 = 1001 = queue 9
hash 10 -> 1010 & 1001 = 1000 = queue 8
hash 11 -> 1011 & 1001 = 1001 = queue 9
{code}

*So the configured pool says “10 FCR processing queues”, but in practice it 
behaves closer to 4 active queues.*

This is observed in the MAT tool check in HDDS-15330.

If the queue count were 16:

{code:java}
queueCount - 1 = 15
15 in binary = 1111
queueIndex = hash & 15
{code}

Then possible queue indexes are 0..15, so all 16 queues can be used.

Currently, we can set the queue count to power of two (e.g. 16).

However, to fix the skew entirely, we can use modulo instead

For example,
{code:java}
index = Math.floorMod(hash, queueCount);
{code}



> Fix FixedThreadPoolWithAffinityExecutor skew when queue size is not power of 
> two
> --------------------------------------------------------------------------------
>
>                 Key: HDDS-15338
>                 URL: https://issues.apache.org/jira/browse/HDDS-15338
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>
> Currently, the implementation of hash for FixedThreadPoolWithAffinityExecutor 
> (used to group the ContainerReportQueue assigned for container reports) is as 
> follow
> {code:java}
>     // For messages that need to be routed to the same thread need to
>     // implement hashCode to match the messages. This should be safe for
>     // other messages that implement the native hash.
>     int index = message.hashCode() & (workQueues.size() - 1);
>     BlockingQueue<Q> queue = workQueues.get(index);
> {code}
> However, this might cause skew when the queueCount is not a power of two 
> (like the when the default is 10)
> For queueCount = 10
> {code:java}
> queueCount - 1 = 9
> 9 in binary = 1001
> {code}
> So the routing becomes:
> {code:java}
> queueIndex = hash & 9
> {code}
> That means only the bits represented by 1001 are kept. The possible results 
> are only:
> {code:java}
> 0  -> 0000
> 1  -> 0001
> 8  -> 1000
> 9  -> 1001
> {code}
> So with 10 queues, only queues 0, 1, 8, and 9 can be selected. Queues 2, 3, 
> 4, 5, 6, 7 are effectively unused.
> Example
> {code:java}
> hash 0  -> 0000 & 1001 = 0000 = queue 0
> hash 1  -> 0001 & 1001 = 0001 = queue 1
> hash 2  -> 0010 & 1001 = 0000 = queue 0
> hash 3  -> 0011 & 1001 = 0001 = queue 1
> hash 4  -> 0100 & 1001 = 0000 = queue 0
> hash 5  -> 0101 & 1001 = 0001 = queue 1
> hash 8  -> 1000 & 1001 = 1000 = queue 8
> hash 9  -> 1001 & 1001 = 1001 = queue 9
> hash 10 -> 1010 & 1001 = 1000 = queue 8
> hash 11 -> 1011 & 1001 = 1001 = queue 9
> {code}
> *So the configured pool says “10 FCR processing queues”, but in practice it 
> behaves closer to 4 active queues.*
> This is observed in the MAT tool check in HDDS-15330.
> If the queue count were 16:
> {code:java}
> queueCount - 1 = 15
> 15 in binary = 1111
> queueIndex = hash & 15
> {code}
> Then possible queue indexes are 0..15, so all 16 queues can be used.
> Currently, we can set the queue count to power of two (e.g. 16).
> However, to fix the skew entirely, we can use modulo instead
> For example,
> {code:java}
> index = Math.floorMod(hash, queueCount);
> {code}
> Hopefully through this we can ensure FCR is processed quickly and SCM memory 
> can be freed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to