Jay Buffington created AURORA-1320:
--------------------------------------

             Summary: when instance is running in docker container, thermos 
observer reports 0 resources
                 Key: AURORA-1320
                 URL: https://issues.apache.org/jira/browse/AURORA-1320
             Project: Aurora
          Issue Type: Bug
          Components: Docker, Thermos
            Reporter: Jay Buffington


To see the problem start a job inside a docker container and view the 
task/instance page.  You'll cpu/ram/disk all at zero regardless of their actual 
usage.

I see errors like this in the thermos observer log:
{noformat}
    W0513 18:41:39.415406 3564 process_collector_psutil.py:42] Error during 
process sampling [pid=112]: process no longer exists (pid=112)
    W0513 18:41:39.415612 3564 process_collector_psutil.py:76] Error during 
process sampling: process no longer exists (pid=112)
    W0513 18:41:39.513972 3564 process_collector_psutil.py:76] Error during 
process sampling: no process found with pid 122
{noformat}

This is likely because observer is running in a different pid namespace than 
the process.  One solution would be for the runner to write out the pid 
namespace it is running in to the checkpoint and then have observer enter that 
namespace while sampling.

Or we can just get rid of the observer?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to