[
https://issues.apache.org/jira/browse/AURORA-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093763#comment-16093763
]
Reza Motamedi commented on AURORA-1939:
---------------------------------------
I added more debug logs to trace what is kept in the process tree:
{noformat}
D0719 20:23:05.758181 24474 process_collector_psutil.py:105] Calculated rate
for pid=62328 and children: 0.000365496747142 (old: 885.570000, new: 885.670000)
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866:
1500007038.32}
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152,
vms=11919360, nice=0, status='sleeping', threads=1), 866:
ProcessSample(rate=0.0, user=315.39, system=570.16, rss=36704256,
vms=1601019904, nice=0, status='sleeping', threads=20)}]
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152,
vms=11919360, nice=0, status='sleeping', threads=1), 866:
ProcessSample(rate=0.0, user=314.96, system=570.69, rss=36134912,
vms=1601019904, nice=0, status='sleeping', threads=20)}]
D0719 20:27:44.454570 24474 process_collector_psutil.py:105] Calculated rate
for pid=62328 and children: 2.10562464635 (old: 885.670000, new: 1472.500000)
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866:
1500007038.32}
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152,
vms=11919360, nice=0, status='sleeping', threads=1), 866:
ProcessSample(rate=0.0, user=314.96, system=570.69, rss=36134912,
vms=1601019904, nice=0, status='sleeping', threads=20)}]
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152,
vms=11919360, nice=0, status='sleeping', threads=1), 866:
ProcessSample(rate=0.0, user=520.93, system=951.55, rss=35430400,
vms=1600204800, nice=0, status='sleeping', threads=20)}]
D0719 20:32:29.451004 24474 process_collector_psutil.py:105] Calculated rate
for pid=62328 and children: 0.00294740555899 (old: 1472.500000, new:
1473.340000)
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866:
1500007038.32}
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152,
vms=11919360, nice=0, status='sleeping', threads=1), 866:
ProcessSample(rate=0.0, user=520.93, system=951.55, rss=35430400,
vms=1600204800, nice=0, status='sleeping', threads=20)}]
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152,
vms=11919360, nice=0, status='sleeping', threads=1), 866:
ProcessSample(rate=0.0, user=521.23, system=952.09, rss=35430400,
vms=1600204800, nice=0, status='sleeping', threads=20)}]
D0719 20:37:10.152525 24474 process_collector_psutil.py:105] Calculated rate
for pid=62328 and children: 0.00299250250927 (old: 1473.340000, new:
1474.180000)
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866:
1500007038.32}
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152,
vms=11919360, nice=0, status='sleeping', threads=1), 866:
ProcessSample(rate=0.0, user=521.23, system=952.09, rss=35430400,
vms=1600204800, nice=0, status='sleeping', threads=20)}]
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152,
vms=11919360, nice=0, status='sleeping', threads=1), 866:
ProcessSample(rate=0.0, user=521.55, system=952.61, rss=35430400,
vms=1600204800, nice=0, status='sleeping', threads=20)}]
D0719 20:42:58.449567 24474 process_collector_psutil.py:105] Calculated rate
for pid=62328 and children: -1.64913818086 (old: 1474.180000, new: 899.790000)
{62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866:
1500007038.32}
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2609152,
vms=11919360, nice=0, status='sleeping', threads=1), 866:
ProcessSample(rate=0.0, user=521.55, system=952.61, rss=35430400,
vms=1600204800, nice=0, status='sleeping', threads=20)}]
[{62328: ProcessSample(rate=0.0, user=0.0, system=0.02, rss=2732032,
vms=11919360, nice=0, status='sleeping', threads=1), 866:
ProcessSample(rate=0.0, user=322.65, system=577.12, rss=35430400,
vms=1600204800, nice=0, status='sleeping', threads=20)}]
{noformat}
Note: {62328: 1500007014.22, 866: 1500007038.32} -> {62328: 1500007014.22, 866:
1500007038.32} what processes where in the old and new sample, as in old ->
new. The line after that is the ProcessSamples in the old pstree, kept in the
object as "pid => ProcessSample", and the line after is the new sample.
> Thermos landing (host) page reports incorrect CPU rates when it is busy
> -----------------------------------------------------------------------
>
> Key: AURORA-1939
> URL: https://issues.apache.org/jira/browse/AURORA-1939
> Project: Aurora
> Issue Type: Bug
> Reporter: Reza Motamedi
> Priority: Minor
>
> Thermos Observer uses `psutil` to monitor resource consumption of Thermos
> Processes. On a busy machine, I have noticed negative CPU values when
> visiting the Thermos landing page.
> In my test I reproduced this by starting many processes that constantly
> create short lived children. This indicates that in time between
> `process_collector_psutil` looks up the Process children and the time it
> calculates the CPU time the pid of the child is actually reused by another
> much younger process, which leads to negative CPU times.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)