Zameer Manji created AURORA-1809:
------------------------------------

             Summary: Investigate flaky test 
TestRunnerKillProcessGroup.test_pg_is_killed
                 Key: AURORA-1809
                 URL: https://issues.apache.org/jira/browse/AURORA-1809
             Project: Aurora
          Issue Type: Bug
            Reporter: Zameer Manji


If you run it apart of the full test suite it fails like this:
{noformat}
 ==================== FAILURES ====================
                     __ TestRunnerKillProcessGroup.test_pg_is_killed __
                     
                     self = <test_staged_kill.TestRunnerKillProcessGroup object 
at 0x7f0c79893e10>
                     
                         def test_pg_is_killed(self):
                           runner = self.start_runner()
                           tm = TaskMonitor(runner.tempdir, 
runner.task_id)
                           self.wait_until_running(tm)
                           process_state, run_number = 
tm.get_active_processes()[0]
                           assert process_state.process == 'process'
                           assert run_number == 0
                         
                           child_pidfile = os.path.join(runner.sandbox, 
runner.task_id, 'child.txt')
                           while not os.path.exists(child_pidfile):
                             time.sleep(0.1)
                           parent_pidfile = os.path.join(runner.sandbox, 
runner.task_id, 'parent.txt')
                           while not os.path.exists(parent_pidfile):
                             time.sleep(0.1)
                           with open(child_pidfile) as fp:
                             child_pid = int(fp.read().rstrip())
                           with open(parent_pidfile) as fp:
                             parent_pid = int(fp.read().rstrip())
                         
                           ps = ProcessProviderFactory.get()
                           ps.collect_all()
                           assert parent_pid in ps.pids()
                           assert child_pid in ps.pids()
                           assert child_pid in 
ps.children_of(parent_pid)
                         
                           with open(os.path.join(runner.sandbox, 
runner.task_id, 'exit.txt'), 'w') as fp:
                             fp.write('go away!')
                         
                           while tm.task_state() is not 
TaskState.SUCCESS:
                             time.sleep(0.1)
                         
                           state = tm.get_state()
                           assert state.processes['process'][0].state == 
ProcessState.SUCCESS
                         
                           ps.collect_all()
                           assert parent_pid not in ps.pids()
                     >     assert child_pid not in ps.pids()
                     E     assert 30475 not in set([1, 2, 3, 5, 7, 8, 
...])
                     E      +  where set([1, 2, 3, 5, 7, 8, ...]) = 
<bound method ProcessProvider_Procfs.pids of 
<twitter.common.process.process_provider_procfs.ProcessProvider_Procfs object 
at 0x7f0c798b1990>>()
                     E      +    where <bound method 
ProcessProvider_Procfs.pids of 
<twitter.common.process.process_provider_procfs.ProcessProvider_Procfs object 
at 0x7f0c798b1990>> = 
<twitter.common.process.process_provider_procfs.ProcessProvider_Procfs object 
at 0x7f0c798b1990>.pids
                     
                     
src/test/python/apache/thermos/core/test_staged_kill.py:287: AssertionError
                     -------------- Captured stderr call --------------
                     WARNING:root:Could not read from checkpoint 
/tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner
                     WARNING:root:Could not read from checkpoint 
/tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner
                     WARNING:root:Could not read from checkpoint 
/tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner
                     WARNING:root:Could not read from checkpoint 
/tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner
                     WARNING:root:Could not read from checkpoint 
/tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner
                     WARNING:root:Could not read from checkpoint 
/tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner
                     WARNING:root:Could not read from checkpoint 
/tmp/tmp9WSRnw/checkpoints/1478305991773556-runner-base/runner
                      generated xml file: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/test-results/415337499eb72578eab327a6487c1f5c9452b3d6.xml
 
                      1 failed, 719 passed, 6 skipped, 1 warnings in 
206.00 seconds 
                     
FAILURE
{noformat}


If you run the test as a one off you see this:
{noformat}
00:45:32 00:00 [main]
               (To run a reporting server: ./pants server)
00:45:32 00:00   [setup]
00:45:32 00:00     [parse]fatal: Not a git repository (or any of the parent 
directories): .git

               Executing tasks in goals: test
00:45:32 00:00   [test]
00:45:32 00:00     [test-prep-command]
00:45:32 00:00     [test]
00:45:32 00:00     [pytest]
00:45:32 00:00       [run]
                     ============== test session starts ===============
                     platform linux2 -- Python 2.7.6 -- py-1.4.31 -- 
pytest-2.6.4 -- /usr/bin/python2.7
                     plugins: cov, timeout
                     collected 83 items

                     
src/test/python/apache/thermos/core/test_staged_kill.py::TestRunnerKillProcessGroup::test_pg_is_killed
 PASSED

                      generated xml file: 
/home/vagrant/aurora/dist/test-results/src.test.python.apache.thermos.core.core.xml
                      82 tests deselected by '-kTestRunnerKillProcessGroup'
                      1 passed, 82 deselected, 1 warnings in 2.49 seconds
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to