On 09/27/2012 07:08 AM, Dave Pigott wrote:
I noticed in the reports view, several jobs which have been stuck for a
while:
http://validation.linaro.org/lava-server/scheduler/job/33203
-------------------------------------------------------------------------------
origen02
------------
A health check running for 4 days. Nothing in the log. I cancelled it,
but it was stuck in cancelling. So I went into admin, put it offline,
and then online to run a health check again. The job itself is still
showing as not finished. How do I track it down on control so that we
can kill it properly?
http://validation.linaro.org/lava-server/scheduler/job/33382
-------------------------------------------------------------------------------
origen04
------------
A regular job that failed, pushed its result bundle and then never quite
stopped running. Same deal as 33203, but I can't get it to run its
health check. Any clues?
http://validation.linaro.org/lava-server/scheduler/job/33372
-------------------------------------------------------------------------------
panda09
------------
Same as origen04 - can't get health check to run.
I don't have the best answer for this, but I'll share what I do.
1) run some "ps -ef| grep" type commands to see if a scheduler or
dispatcher process is still running for that board. I then kill those.
2) usually the job and board get left a bit out of sync. So I run my
"cancel-job.py" script on control:/home/doanac/lava-scripts. It looks like:
#!/srv/lava/instances/production/bin/py
import sys
import lava_scheduler_app.models as models
for jid in sys.argv[1:]:
jid = int(jid)
print "canceling: %d" % jid
job = models.TestJob.objects.get(pk=jid)
job.status = job.CANCELED
job.save()
I suspect when mwhudson logs in he may have a better answer.
_______________________________________________
linaro-validation mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/linaro-validation