[ 
https://issues.apache.org/jira/browse/HADOOP-12528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985726#comment-14985726
 ] 

Staffan Friberg commented on HADOOP-12528:
------------------------------------------

The number of entries into Thread Park during 10minutes on a NN with 60 IPC 
threads goes down from 36000 to around 900. Seems lke one group of IPC threads 
wake up every 20s, and the other every 2 minutes, I was doing a large file 
delete so not sure if that would increase the heartbeating/communication 
anything other than the amount of data transfered.

So with the 1s poll, you will have 60 threads waking up each second and then 
going back to sleep again, for large clusters with more IPC threads this would 
go up even further.
How many IPC threads will a very large cluster be configured with?

The other cost is that each time you enter a small allocation of the 
synchronization object will occur.

What JVM metrics are you collecting and how?

> Avoid spinning in CallQueueManager.take()
> -----------------------------------------
>
>                 Key: HADOOP-12528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12528
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: performance
>    Affects Versions: 2.7.1
>            Reporter: Staffan Friberg
>            Assignee: Staffan Friberg
>            Priority: Minor
>         Attachments: HADOOP-12528.001.patch, HADOOP-12528.002.patch
>
>
> When IPC threads (Server$Handler) does take() to get the next Call, the 
> CallManager does a poll instead of take() on the internal queue.
> This causes threads to wake up and unnecessarily waste some CPU and do extra 
> allocation as part of the internal await/signal mechanism each time the 
> thread redoes poll().
> This patch uses take() on the queue instead of poll() which will keep thread 
> in the await state until work is available. Since threads will be blocked on 
> the queue indefinitely the swapping of queues requires a bit of extra work to 
> make sure threads wake up and does take on the new queue.
> Updated the test TestCallQueueManager.testSwapUnderContention() to ensure 
> that no threads get stuck on the old queue as part of swapping.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to