On 09/27/2012 07:08 AM, Dave Pigott wrote:
I noticed in the reports view, several jobs which have been stuck for a
while:

http://validation.linaro.org/lava-server/scheduler/job/33203
-------------------------------------------------------------------------------
origen02
------------
A health check running for 4 days. Nothing in the log. I cancelled it,
but it was stuck in cancelling. So I went into admin, put it offline,
and then online to run a health check again. The job itself is still
showing as not finished. How do I track it down on control so that we
can kill it properly?


http://validation.linaro.org/lava-server/scheduler/job/33382
-------------------------------------------------------------------------------
origen04
------------
A regular job that failed, pushed its result bundle and then never quite
stopped running.  Same deal as 33203, but I can't get it to run its
health check. Any clues?


http://validation.linaro.org/lava-server/scheduler/job/33372
-------------------------------------------------------------------------------
panda09
------------
Same as origen04 - can't get health check to run.

I don't have the best answer for this, but I'll share what I do.

1) run some "ps -ef| grep" type commands to see if a scheduler or dispatcher process is still running for that board. I then kill those.

2) usually the job and board get left a bit out of sync. So I run my "cancel-job.py" script on control:/home/doanac/lava-scripts. It looks like:

 #!/srv/lava/instances/production/bin/py

 import sys
 import lava_scheduler_app.models as models

 for jid in sys.argv[1:]:
     jid = int(jid)
     print "canceling: %d" % jid
     job = models.TestJob.objects.get(pk=jid)
     job.status = job.CANCELED
     job.save()

I suspect when mwhudson logs in he may have a better answer.

_______________________________________________
linaro-validation mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/linaro-validation

Reply via email to