[
https://issues.apache.org/jira/browse/AURORA-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093883#comment-16093883
]
Reza Motamedi commented on AURORA-1939:
---------------------------------------
I just started tailing my logs and as samples came in catted the
proc/`<pid>`/stat.
log snippets:
{noformat}
~ $ tail -n 1000 -f thermos-observer.DEBUG | grep 62328
D0719 22:01:37.055901 24474 process_collector_psutil.py:105] Calculated rate
for pid=62328 and children: -1.94176829079 (old: 1488.840000, new: 920.870000)
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866:
1500007038.32} [{62328: ProcessSample(rate=0.0, user=0.0, system=0.02,
rss=2609152, vms=11919360, nice=0, status='sleeping', threads=1), 866:
ProcessSample(rate=0.0, user=526.98, system=961.84, rss=35430400,
vms=1600204800, nice=0, status='sleeping', threads=20)}] [{62328:
ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, vms=11919360,
nice=0, status='sleeping', threads=1), 866: ProcessSample(rate=0.0,
user=332.69, system=588.16, rss=35430400, vms=1600204800, nice=0,
status='sleeping', threads=20)}]
D0719 22:06:22.156851 24474 process_collector_psutil.py:105] Calculated rate
for pid=62328 and children: -0.00277094847146 (old: 920.870000, new:
920.080000) {62328: 1500007014.22, 866: 1500007038.32} -> {62328:
1500007014.22, 866: 1500007038.32} [{62328: ProcessSample(rate=0.0, user=0.0,
system=0.02, rss=2609152, vms=11919360, nice=0, status='sleeping', threads=1),
866: ProcessSample(rate=0.0, user=332.69, system=588.16, rss=35430400,
vms=1600204800, nice=0, status='sleeping', threads=20)}] [{62328:
ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, vms=11919360,
nice=0, status='sleeping', threads=1), 866: ProcessSample(rate=0.0,
user=329.27, system=590.79, rss=34787328, vms=1600204800, nice=0,
status='sleeping', threads=20)}]
D0719 22:11:07.052711 24474 process_collector_psutil.py:105] Calculated rate
for pid=62328 and children: -0.0636723871193 (old: 920.080000, new: 901.940000)
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866:
1500007038.32} [{62328: ProcessSample(rate=0.0, user=0.0, system=0.02,
rss=2609152, vms=11919360, nice=0, status='sleeping', threads=1), 866:
ProcessSample(rate=0.0, user=329.27, system=590.79, rss=34787328,
vms=1600204800, nice=0, status='sleeping', threads=20)}] [{62328:
ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152, vms=11919360,
nice=0, status='sleeping', threads=1), 866: ProcessSample(rate=0.0,
user=326.72, system=575.2, rss=35430400, vms=1600204800, nice=0,
status='sleeping', threads=20)}]
{noformat}
checking values directly. Also added date so that we can compare times.
{noformat}
~ $ date; cat /proc/866/stat | awk '{print $14, $15}'
Wed Jul 19 22:01:41 UTC 2017
52733 96241
~ $ date; cat /proc/866/stat | awk '{print $14, $15}'
Wed Jul 19 22:06:27 UTC 2017
52766 96296
{noformat}
> Thermos landing (host) page reports incorrect CPU rates when it is busy
> -----------------------------------------------------------------------
>
> Key: AURORA-1939
> URL: https://issues.apache.org/jira/browse/AURORA-1939
> Project: Aurora
> Issue Type: Bug
> Reporter: Reza Motamedi
> Priority: Minor
>
> Thermos Observer uses `psutil` to monitor resource consumption of Thermos
> Processes. On a busy machine, I have noticed negative CPU values when
> visiting the Thermos landing page.
> In my test I reproduced this by starting many processes that constantly
> create short lived children. This indicates that in time between
> `process_collector_psutil` looks up the Process children and the time it
> calculates the CPU time the pid of the child is actually reused by another
> much younger process, which leads to negative CPU times.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)