I am trying to get a Galaxy instance running on a Rocks cluster.  I am able
to run jobs with the local runner at this point, but I am having an issue
with the drmaa runner that I haven't been able to fix.  When I submit a job
in Galaxy it is successfully submitted to the cluster and runs to
completion according to qacct, but Galaxy just reports "failure running
job".

Here's what is written to paster.log when I submit a job:

69.181.235.240 - - [19/Jan/2016:11:24:31 -0700] "GET
> /api/histories/fb86c918c0d3d33b/contents?dataset_details=bae154fe2294752e%2C6fe732485990d2ac%2C604c4e6e60e997bc%2Cf015f1cb819ec50e%2C9f6f4b3cb6cf43eb%2C3d13d598882b6eb8%2C551006fddcb290ae%2C10b9bbc646c48387%2C7670dfdf35146bc5%2Ce0ec2cf59f1fc79e%2Cee30922e5e4854db%2C9e7a0ba216194210
> HTTP/1.1" 200 - "https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT
> 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
> Chrome/47.0.2526.111 Safari/537.36"
> 69.181.235.240 - - [19/Jan/2016:11:24:38 -0700] "GET
> /tool_runner/data_source_redirect?tool_id=ucsc_table_direct1 HTTP/1.1" 302
> - "https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT 10.0; Win64;
> x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111
> Safari/537.36"
> galaxy.tools.actions.__init__ INFO 2016-01-19 11:24:42,801 Handled output
> (327.778 ms)
> galaxy.tools.actions.__init__ INFO 2016-01-19 11:24:43,236 Verified access
> to datasets (0.023 ms)
> galaxy.tools.execute DEBUG 2016-01-19 11:24:43,343 Tool
> [ucsc_table_direct1] created job [7019] (919.481 ms)
> 69.181.235.240 - - [19/Jan/2016:11:24:42 -0700] "POST /tool_runner
> HTTP/1.1" 200 - "https://genome.ucsc.edu/cgi-bin/hgTables"; "Mozilla/5.0
> (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
> Chrome/47.0.2526.111 Safari/537.36"
> galaxy.jobs DEBUG 2016-01-19 11:24:44,056 (7019) Working directory for
> job is: /campusdata/galaxy/galaxy/database/job_working_directory/007/7019
> galaxy.jobs.handler DEBUG 2016-01-19 11:24:44,070 (7019) Dispatching to
> sge runner
> galaxy.jobs DEBUG 2016-01-19 11:24:44,378 (7019) Persisting job
> destination (destination id: sge_default)
> galaxy.jobs.runners DEBUG 2016-01-19 11:24:44,403 Job [7019] queued
> (332.423 ms)
> galaxy.jobs.handler INFO 2016-01-19 11:24:44,444 (7019) Job dispatched
> 69.181.235.240 - - [19/Jan/2016:11:24:44 -0700] "GET /api/genomes
> HTTP/1.1" 200 - "https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT
> 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
> Chrome/47.0.2526.111 Safari/537.36"
> 69.181.235.240 - - [19/Jan/2016:11:24:44 -0700] "GET
> /api/datatypes?extension_only=False& HTTP/1.1" 200 - "
> https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT 10.0; Win64; x64)
> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36"
> 69.181.235.240 - - [19/Jan/2016:11:24:44 -0700] "GET
> /history/current_history_json HTTP/1.1" 200 - "
> https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT 10.0; Win64; x64)
> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36"
> galaxy.jobs.runners DEBUG 2016-01-19 11:24:46,399 (7019) command is:
> python /campusdata/galaxy/galaxy/tools/data_source/data_source.py
> /campusdata/galaxy/galaxy/database/files/011/dataset_11361.dat 0;
> return_code=$?; python
> "/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/set_metadata_IaPURP.py"
> "/campusdata/galaxy/galaxy/database/tmp/tmp9Qt0cv"
> "/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/galaxy.json"
> "/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/metadata_in_HistoryDatasetAssociation_13512_oucw5s,/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/metadata_kwds_HistoryDatasetAssociation_13512_ZrUbrF,/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/metadata_out_HistoryDatasetAssociation_13512_twCvq7,/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/metadata_results_HistoryDatasetAssociation_13512_FO1cy9,/campusdata/galaxy/galaxy/database/files/011/dataset_11361.dat,/campusdata/galaxy/galaxy/database/job_working_directory/007/7019/metadata_override_HistoryDatasetAssociation_13512_Z_cUTF"
> 5242880; sh -c "exit $return_code"
> galaxy.jobs.runners.drmaa DEBUG 2016-01-19 11:24:46,787 (7019) submitting
> file
> /campusdata/galaxy/galaxy/database/job_working_directory/007/7019/galaxy_7019.sh
> galaxy.jobs.runners.drmaa DEBUG 2016-01-19 11:24:46,808 (7019) native
> specification is: -R y -pe mpi 8 -q small.q
> galaxy.jobs DEBUG 2016-01-19 11:24:46,828 (7019) Changing ownership of
> working directory with: /usr/bin/sudo -E scripts/external_chown_script.py
> /campusdata/galaxy/galaxy/database/job_working_directory/007/7019 eshell
> 100000
> galaxy.jobs.runners.drmaa DEBUG 2016-01-19 11:24:47,020 (7019) submitting
> with credentials: eshell [uid: 38559]
> galaxy.jobs.runners.drmaa DEBUG 2016-01-19 11:24:47,129 (7019) Job script
> for external submission is:
> /campusdata/galaxy/galaxy/database/pbs/7019.jt_json
> galaxy.jobs.runners.drmaa INFO 2016-01-19 11:24:47,130 Running command
> ['/usr/bin/sudo', '-E', 'scripts/drmaa_external_runner.py', '38559',
> '/campusdata/galaxy/galaxy/database/pbs/7019.jt_json']
> galaxy.jobs.runners.drmaa INFO 2016-01-19 11:24:47,981 (7019) queued as
> 116563
> galaxy.jobs DEBUG 2016-01-19 11:24:48,198 (7019) Persisting job
> destination (destination id: sge_default)
> galaxy.jobs.runners.drmaa DEBUG 2016-01-19 11:24:48,823 (7019/116563)
> state change: job is queued and active
> 69.181.235.240 - - [19/Jan/2016:11:24:45 -0700] "GET
> /api/histories/fb86c918c0d3d33b/contents?dataset_details=bae154fe2294752e%2C6fe732485990d2ac%2C604c4e6e60e997bc%2Cf015f1cb819ec50e%2C9f6f4b3cb6cf43eb%2C3d13d598882b6eb8%2C551006fddcb290ae%2C10b9bbc646c48387%2C7670dfdf35146bc5%2Ce0ec2cf59f1fc79e%2Cee30922e5e4854db%2C9e7a0ba216194210
> HTTP/1.1" 200 - "https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT
> 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
> Chrome/47.0.2526.111 Safari/537.36"
> galaxy.jobs.runners.drmaa DEBUG 2016-01-19 11:24:53,532 (7019/116563)
> state change: job is running
> galaxy.jobs WARNING 2016-01-19 11:24:53,922 (7019) Ignoring state change
> from 'error' to 'running' for job that is already terminal
> 69.181.235.240 - - [19/Jan/2016:11:24:54 -0700] "GET
> /api/histories/fb86c918c0d3d33b/contents HTTP/1.1" 200 - "
> https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT 10.0; Win64; x64)
> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36"
> 69.181.235.240 - - [19/Jan/2016:11:24:55 -0700] "GET
> /api/histories/fb86c918c0d3d33b HTTP/1.1" 200 - "
> https://galaxy.soe.ucsc.edu/"; "Mozilla/5.0 (Windows NT 10.0; Win64; x64)
> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36"
> galaxy.jobs.runners.drmaa INFO 2016-01-19 11:25:06,240 (7019/116563) job
> left DRM queue with following message: code 18: The job specified by the
> 'jobid' does not exist.
> galaxy.jobs DEBUG 2016-01-19 11:25:06,412 (7019) Changing ownership of
> working directory with: /usr/bin/sudo -E scripts/external_chown_script.py
> /campusdata/galaxy/galaxy/database/job_working_directory/007/7019 galaxy
> 59997
> galaxy.jobs DEBUG 2016-01-19 11:25:06,622 (7019) Changing ownership of
> working directory with: /usr/bin/sudo -E scripts/external_chown_script.py
> /campusdata/galaxy/galaxy/database/job_working_directory/007/7019 galaxy
> 59997


'qacct -j -o eshell' shows that job 7019 completed, though:

qname        all.q
> hostname     campusrocks2-0-4.local
> group        users
> owner        eshell
> project      NONE
> department   defaultdepartment
> jobname      g7019_ucsc_table_direct1_eshell_ucsc_edu
> jobnumber    116563
> taskid       undefined
> account      sge
> priority     0
> qsub_time    Tue Jan 19 11:24:47 2016
> start_time   Tue Jan 19 11:24:53 2016
> end_time     Tue Jan 19 11:25:05 2016
> granted_pe   mpi
> slots        8
> failed       0
> exit_status  0
> ru_wallclock 12
> ru_utime     9.285
> ru_stime     0.908
> ru_maxrss    98384
> ru_ixrss     0
> ru_ismrss    0
> ru_idrss     0
> ru_isrss     0
> ru_minflt    81778
> ru_majflt    2
> ru_nswap     0
> ru_inblock   6728
> ru_oublock   184
> ru_msgsnd    0
> ru_msgrcv    0
> ru_nsignals  0
> ru_nvcsw     13952
> ru_nivcsw    301
> cpu          10.192
> mem          2.326
> io           0.150
> iow          0.000
> maxvmem      448.820M
> arid         undefined


Why does Galaxy not see the job after it has been submitted to the cluster?

Thanks in advance for your help!

-- 
Eric Shell
UNIX Software & Google Apps Administrator
Baskin School of Engineering
UC Santa Cruz
831 459 4919
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to