[
https://issues.apache.org/jira/browse/AURORA-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097798#comment-16097798
]
Stephan Erb commented on AURORA-1939:
-------------------------------------
This is now on master. Thanks for the patch!
{code}
commit cdc5b8efd5bb86d38f73cca6d91903078b120333
Author: Reza Motamedi [email protected]
Date: Sat Jul 22 20:28:50 2017 +0200
Remove psutil's oneshot
After a lot of testing on busy machines, I realized that psutil's oneshot is
not threadsafe. I contacted the developer however, have not recevied a conceret
fix.
Please read https://issues.apache.org/jira/browse/AURORA-1939 and
https://github.com/giampaolo/psutil/issues/1110 for more information.
These inconsistencies disappear after removing oneshot.
Reviewed at https://reviews.apache.org/r/61016/
src/main/python/apache/thermos/monitoring/process_collector_psutil.py | 23
+++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
{code}
> Thermos landing (host) page reports incorrect CPU rates when it is busy
> -----------------------------------------------------------------------
>
> Key: AURORA-1939
> URL: https://issues.apache.org/jira/browse/AURORA-1939
> Project: Aurora
> Issue Type: Bug
> Reporter: Reza Motamedi
> Assignee: Reza Motamedi
> Priority: Minor
>
> Thermos Observer uses `psutil` to monitor resource consumption of Thermos
> Processes. On a busy machine, I have noticed negative CPU values when
> visiting the Thermos landing page.
> In my test I reproduced this by starting many processes that constantly
> create short lived children. This indicates that in time between
> `process_collector_psutil` looks up the Process children and the time it
> calculates the CPU time the pid of the child is actually reused by another
> much younger process, which leads to negative CPU times.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)