On Tue, Oct 8, 2013 at 5:03 PM, Adhemar <azn...@gmail.com> wrote:
> Hi,
> After the last update I'm getting the following error.
> The job is submitted to SGE e executed, but galaxy doesn't get the result
> and keeps showing the job is executing (yellow box).
> Any clues?
> Thanks,
> Adhemar
>
>
>
> galaxy.jobs.runners ERROR 2013-10-08 13:01:18,488 Unhandled exception
> checking active jobs
> Traceback (most recent call last):
>   File
> "/opt/bioinformatics/share/galaxy20130410/lib/galaxy/jobs/runners/__init__.py",
> line 362, in monitor
>     self.check_watched_items()
>   File
> "/opt/bioinformatics/share/galaxy20130410/lib/galaxy/jobs/runners/drmaa.py",
> line 217, in check_watched_items
>     log.warning( "(%s/%s) job check resulted in %s: %s", galaxy_id_tag,
> external_job_id, e.__class__.name, e )
> AttributeError: type object 'InvalidJobException' has no attribute 'name'


Same here, running galaxy-central with an SGE cluster (actually UGE
but the same DRMAA wrapper etc) when cancelling several jobs via
qdel at the command line:

Galaxy.jobs.runners ERROR 2013-10-10 15:16:35,731 Unhandled exception
checking active jobs
Traceback (most recent call last):
  File "/mnt/galaxy/galaxy-central/lib/galaxy/jobs/runners/__init__.py",
line 362, in monitor
    self.check_watched_items()
  File "/mnt/galaxy/galaxy-central/lib/galaxy/jobs/runners/drmaa.py",
line 217, in check_watched_items
    log.warning( "(%s/%s) job check resulted in %s: %s",
galaxy_id_tag, external_job_id, e.__class__.name, e )
AttributeError: type object 'InvalidJobException' has no attribute 'name'

$ hg branch
default
[galaxy@ppserver galaxy-central]$ hg heads | more
changeset:   11871:c8b55344e779
tag:         tip
user:        Ross Lazarus <ross.laza...@gmail.com>
date:        Tue Oct 08 16:30:54 2013 +1100
summary:     Proper removal of rgenetics deprecated tool wrappers

changeset:   11818:1f0e7ae9e324
branch:      stable
parent:      11761:a477486bf18e
user:        Daniel Blankenberg <d...@bx.psu.edu>
date:        Sun Sep 29 16:04:31 2013 +1000
summary:     Add additional check and slice to _sniffnfix_pg9_hex().
Fixes issue seen when attempting to view saved visualizations. Further
investigation may be needed.
...

Killing Galaxy and restarting didn't fix this, the errors persist.
I tried this fix to solve the attribute error in the logging call:

$ hg diff /mnt/galaxy/galaxy-central/lib/galaxy/jobs/runners/drmaa.py
diff -r c8b55344e779 lib/galaxy/jobs/runners/drmaa.py
--- a/lib/galaxy/jobs/runners/drmaa.py    Tue Oct 08 16:30:54 2013 +1100
+++ b/lib/galaxy/jobs/runners/drmaa.py    Thu Oct 10 15:21:56 2013 +0100
@@ -214,7 +214,10 @@
                 state = self.ds.jobStatus( external_job_id )
             # TODO: probably need to keep track of
InvalidJobException count and remove after it exceeds some
configurable
             except ( drmaa.DrmCommunicationException,
drmaa.InternalException, drmaa.InvalidJobException ), e:
-                log.warning( "(%s/%s) job check resulted in %s: %s",
galaxy_id_tag, external_job_id, e.__class__.name, e )
+                if hasattr(e.__class__, "name"):
+                    log.warning( "(%s/%s) job check resulted in %s:
%s", galaxy_id_tag, external_job_id, e.__class__.name, e )
+                else:
+                    log.warning( "(%s/%s) job check resulted in: %s",
galaxy_id_tag, external_job_id, e )
                 new_watched.append( ajs )
                 continue
             except Exception, e:


Now I get lots of these lines instead:

galaxy.jobs.runners.drmaa WARNING 2013-10-10 15:22:16,489 (251/11372)
job check resulted in: code 18: The job specified by the 'jobid' does
not exist.
galaxy.jobs.runners.drmaa WARNING 2013-10-10 15:22:16,533 (252/11373)
job check resulted in: code 18: The job specified by the 'jobid' does
not exist.
galaxy.jobs.runners.drmaa WARNING 2013-10-10 15:22:17,580 (253/11374)
job check resulted in: code 18: The job specified by the 'jobid' does
not exist.
galaxy.jobs.runners.drmaa WARNING 2013-10-10 15:22:17,624 (254/11375)
job check resulted in: code 18: The job specified by the 'jobid' does
not exist.
galaxy.jobs.runners.drmaa WARNING 2013-10-10 15:22:17,668 (255/11376)
job check resulted in: code 18: The job specified by the 'jobid' does
not exist.
galaxy.jobs.runners.drmaa WARNING 2013-10-10 15:22:17,712 (256/11377)
job check resulted in: code 18: The job specified by the 'jobid' does
not exist.
(this seems to repeat, endlessly)

I manually killed the jobs from the Galaxy history, and restarted
Galaxy again. That seemed to fix this.

If the DRMAA layer says the job was invalid (which is what I am
assuming InvalidJobException means) then surely it failed?
Perhaps something like this (untested)?

$ hg diff /mnt/galaxy/galaxy-central/lib/galaxy/jobs/runners/drmaa.py
diff -r c8b55344e779 lib/galaxy/jobs/runners/drmaa.py
--- a/lib/galaxy/jobs/runners/drmaa.py    Tue Oct 08 16:30:54 2013 +1100
+++ b/lib/galaxy/jobs/runners/drmaa.py    Thu Oct 10 15:27:28 2013 +0100
@@ -213,10 +213,15 @@
                 assert external_job_id not in ( None, 'None' ),
'(%s/%s) Invalid job id' % ( galaxy_id_tag, external_job_id )
                 state = self.ds.jobStatus( external_job_id )
             # TODO: probably need to keep track of
InvalidJobException count and remove after it exceeds some
configurable
-            except ( drmaa.DrmCommunicationException,
drmaa.InternalException, drmaa.InvalidJobException ), e:
+            except ( drmaa.DrmCommunicationException,
drmaa.InternalException ), e:
                 log.warning( "(%s/%s) job check resulted in %s: %s",
galaxy_id_tag, external_job_id, e.__class__.name, e )
                 new_watched.append( ajs )
                 continue
+            except drmaa.InvalidJobException, e:
+                log.warning( "(%s/%s) job check resulted in: %s",
galaxy_id_tag, external_job_id, e )
+                ajs.fail_message = str(e)
+                self.work_queue.put( ( self.fail_job, ajs ) )
+                continue
             except Exception, e:
                 # so we don't kill the monitor thread
                 log.exception( "(%s/%s) Unable to check job status:
%s" % ( galaxy_id_tag, external_job_id, str( e ) ) )

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to