Hi all, Something has changed in the job handling, and in a bad way. On my development machine submitting jobs to the cluster didn't seem to be working anymore (never sent to SGE). I killed Galaxy and restarted:
Starting server in PID 12180. serving on http://127.0.0.1:8081 galaxy.jobs.runners.drmaa ERROR 2012-11-15 09:56:28,192 (320/None) Unable to check job status Traceback (most recent call last): File "/mnt/galaxy/galaxy-central/lib/galaxy/jobs/runners/drmaa.py", line 296, in check_watched_items state = self.ds.jobStatus( job_id ) File "/mnt/galaxy/galaxy-central/eggs/drmaa-0.4b3-py2.6.egg/drmaa/__init__.py", line 522, in jobStatus _h.c(_w.drmaa_job_ps, jobName, _ct.byref(status)) File "/mnt/galaxy/galaxy-central/eggs/drmaa-0.4b3-py2.6.egg/drmaa/helpers.py", line 213, in c return f(*(args + (error_buffer, sizeof(error_buffer)))) File "/mnt/galaxy/galaxy-central/eggs/drmaa-0.4b3-py2.6.egg/drmaa/errors.py", line 90, in error_check raise _ERRORS[code-1]("code %s: %s" % (code, error_buffer.value)) InvalidArgumentException: code 4: Job id, "None", is not a valid job id galaxy.jobs.runners.drmaa WARNING 2012-11-15 09:56:28,193 (320/None) job will now be errored ./run.sh: line 86: 12180 Segmentation fault (core dumped) python ./scripts/paster.py serve universe_wsgi.ini $@ I restarted and it happened again, third time lucky. I presume this was one segmentation fault for each orphaned/zombie job (since I'd tried two cluster jobs which got stuck). I was running with revision 340438c62171, https://bitbucket.org/galaxy/galaxy-central/changeset/340438c62171578078323d39da398d5053b69d0a as merged into my tools branch, https://bitbucket.org/peterjc/galaxy-central/changeset/d49200df0707579f41fc4f25042354604ce20e63 Any thoughts? Thanks, Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
