[
https://issues.apache.org/jira/browse/CLOUDSTACK-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16247746#comment-16247746
]
ASF subversion and git services commented on CLOUDSTACK-10136:
--------------------------------------------------------------
Commit 3ee8d83621c23f976413fdce6d9245197497d504 in cloudstack's branch
refs/heads/master from [[email protected]]
[ https://gitbox.apache.org/repos/asf?p=cloudstack.git;h=3ee8d83 ]
CLOUDSTACK-10136: Fix RemoteHostEndPoint thread growth
This fixes the following:
- Unchecked thread growth in RemoteEndHostEndPoint
- Potential NPE while finding EP for a storage/scope
Unbounded thread growth can be reproduced with following findings:
- Every unreachable template would produce 6 new threads (in a single
ScheduledExecutorService instance) spaced by 10 seconds
- Every reachable template url without the template would produce 1 new
thread (and one ScheduledExecutorService instance), it errors out quickly
without
causing more thread growth.
- Every valid url will produce upto 10 threads as the same ep (endpoint
instance) will be reused to query upload/download (async callback)
progresses.
Every RemoteHostEndPoint instances creates its own
ScheduledExecutorService instance which is why in the jstack dump, we
see several threads that share the prefix RemoteHostEndPoint-{1..10}
(given poolsize is defined as 10, it uses suffixes 1-10).
This fixes the discovered thread leakage with following notes:
- Instead of ScheduledExecutorService instance, a cached pool could be
used instead and was implemented, and with `static` scope to be reused
among other future RemoteHostEndPoint instances.
- It was not clear why we would want to wait when we've Answers returned
from the remote EP, and therefore a scheduled/delayed Runnable was
not required at all for processing answers. ScheduledExecutorService
was therefore not really required, moved to ExecutorService instead.
- Another benefit of using a cached pool is that it will shutdown
threads if they are not used in 60 seconds, and they get re-used for
future runnable submissions.
- Caveat: the executor service is still unbounded, however, the use-case
that this method is used for short jobs to check upload/download
progresses fits the case here.
- Refactored CmdRunner to not use/reference objects from parent class.
Signed-off-by: Rohit Yadav <[email protected]>
> Fix thread growth/leak issue
> ----------------------------
>
> Key: CLOUDSTACK-10136
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-10136
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Affects Versions: 4.5.2, 4.6.2, 4.7.1, 4.10.0.0, 4.9.2.0, 4.8.1.1, 4.9.3.0
> Reporter: Rohit Yadav
> Assignee: Rohit Yadav
> Fix For: 4.11.0.0
>
>
> For long running mgmt server with large amounts of templates etc, large
> amounts of waiting threads are seen that start with the 'RemoteHostEndPoint-'
> prefix. These async threads are responsible mostly for checking
> template/volume upload/download progress/states. They kick everytime a
> template is being checked/downloaded setup etc.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)