Hey John,
So I’ve been happily using Pulsar to send all my Galaxy server jobs to our
cluster here at UCL for several months now (I love it!). I am now exploring the
‘run-as-real-user’ option for DRMAA submissions and have run into a problem.
The files are correctly staged, correctly chowned, successfully submitted to
the queue and the job runs. However, at job end (collection?) fails with the
following error message in Pulsar:
Exception happened during processing of request from (‘*.*.*.*', 54321)
Traceback (most recent call last):
File
"/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py",
line 1072, in process_request_in_thread
self.finish_request(request, client_address)
File "/opt/rocks/lib/python2.6/SocketServer.py", line 322, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/opt/rocks/lib/python2.6/SocketServer.py", line 617, in __init__
self.handle()
File
"/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py",
line 446, in handle
BaseHTTPRequestHandler.handle(self)
File "/opt/rocks/lib/python2.6/BaseHTTPServer.py", line 329, in handle
self.handle_one_request()
File
"/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py",
line 441, in handle_one_request
self.wsgi_execute()
File
"/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py",
line 291, in wsgi_execute
self.wsgi_start_response)
File "/cluster/galaxy/pulsar/pulsar/web/framework.py", line 39, in __call__
return controller(environ, start_response, **request_args)
File "/cluster/galaxy/pulsar/pulsar/web/framework.py", line 144, in
controller_replacement
result = self.__execute_request(func, args, req, environ)
File "/cluster/galaxy/pulsar/pulsar/web/framework.py", line 124, in
__execute_request
result = func(**args)
File "/cluster/galaxy/pulsar/pulsar/web/routes.py", line 82, in status
return status_dict(manager, job_id)
File "/cluster/galaxy/pulsar/pulsar/manager_endpoint_util.py", line 12, in
status_dict
job_status = manager.get_status(job_id)
File "/cluster/galaxy/pulsar/pulsar/managers/stateful.py", line 95, in
get_status
proxy_status, state_change = self.__proxy_status(job_directory, job_id)
File "/cluster/galaxy/pulsar/pulsar/managers/stateful.py", line 115, in
__proxy_status
proxy_status = self._proxied_manager.get_status(job_id)
File
"/cluster/galaxy/pulsar/pulsar/managers/queued_external_drmaa_original.py",
line 62, in get_status
external_status = super(ExternalDrmaaQueueManager,
self)._get_status_external(external_id)
File "/cluster/galaxy/pulsar/pulsar/managers/base/base_drmaa.py", line 31, in
_get_status_external
drmaa_state = self.drmaa_session.job_status(external_id)
File "/cluster/galaxy/pulsar/pulsar/managers/util/drmaa/__init__.py", line
50, in job_status
return self.session.jobStatus(str(external_job_id))
File "build/bdist.linux-x86_64/egg/drmaa/session.py", line 518, in jobStatus
c(drmaa_job_ps, jobId, byref(status))
File "build/bdist.linux-x86_64/egg/drmaa/helpers.py", line 299, in c
return f(*(args + (error_buffer, sizeof(error_buffer))))
File "build/bdist.linux-x86_64/egg/drmaa/errors.py", line 151, in error_check
raise _ERRORS[code - 1](error_string)
InvalidJobException: code 18: The job specified by the 'jobid' does not exist.
With this corresponding error from my Galaxy server:
galaxy.tools.actions INFO 2016-10-13 18:47:51,851 Handled output (279.421 ms)
galaxy.tools.actions INFO 2016-10-13 18:47:52,093 Verified access to datasets
(5.271 ms)
galaxy.tools.execute DEBUG 2016-10-13 18:47:52,118 Tool
[toolshed.g2.bx.psu.edu/repos/devteam/sam_to_bam/sam_to_bam/1.1.4] created job
[25008<http://toolshed.g2.bx.psu.edu/repos/devteam/sam_to_bam/sam_to_bam/1.1.4]%20created%20job%20[25008>]
(560.404 ms)
galaxy.jobs DEBUG 2016-10-13 18:47:52,579 (25008) Working directory for job is:
/Users/galaxy/galaxy-dist/database/job_working_directory/025/25008
galaxy.jobs.handler DEBUG 2016-10-13 18:47:52,591 (25008) Dispatching to pulsar
runner
galaxy.jobs DEBUG 2016-10-13 18:47:52,677 (25008) Persisting job destination
(destination id: hpc_low)
galaxy.jobs.runners DEBUG 2016-10-13 18:47:52,681 Job [25008] queued (90.231 ms)
galaxy.jobs.handler INFO 2016-10-13 18:47:52,699 (25008) Job dispatched
galaxy.tools.deps DEBUG 2016-10-13 18:47:53,138 Building dependency shell
command for dependency 'samtools'
galaxy.jobs.runners.pulsar INFO 2016-10-13 18:47:53,233 Pulsar job submitted
with job_id 25008
galaxy.jobs DEBUG 2016-10-13 18:47:53,257 (25008) Persisting job destination
(destination id: hpc_low)
galaxy.datatypes.metadata DEBUG 2016-10-13 18:51:03,922 Cleaning up external
metadata files
galaxy.jobs.runners.pulsar ERROR 2016-10-13 18:51:03,945 failure finishing job
25008
Traceback (most recent call last):
File "/Users/galaxy/galaxy-dist/lib/galaxy/jobs/runners/pulsar.py", line 386,
in finish_job
run_results = client.full_status()
File "/Users/galaxy/galaxy-dist/lib/pulsar/client/client.py", line 132, in
full_status
return self.raw_check_complete()
File "/Users/galaxy/galaxy-dist/lib/pulsar/client/decorators.py", line 28, in
replacement
return func(*args, **kwargs)
File "/Users/galaxy/galaxy-dist/lib/pulsar/client/decorators.py", line 13, in
replacement
response = func(*args, **kwargs)
File "/Users/galaxy/galaxy-dist/lib/pulsar/client/client.py", line 146, in
raw_check_complete
check_complete_response = self._raw_execute("status", {"job_id":
self.job_id})
File "/Users/galaxy/galaxy-dist/lib/pulsar/client/client.py", line 215, in
_raw_execute
return self.job_manager_interface.execute(command, args, data, input_path,
output_path)
File "/Users/galaxy/galaxy-dist/lib/pulsar/client/interface.py", line 96, in
execute
response = self.transport.execute(url, method=method, data=data,
input_path=input_path, output_path=output_path)
File "/Users/galaxy/galaxy-dist/lib/pulsar/client/transport/standard.py",
line 34, in execute
response = self._url_open(request, data)
File "/Users/galaxy/galaxy-dist/lib/pulsar/client/transport/standard.py",
line 20, in _url_open
return urlopen(request, data)
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
line 154, in urlopen
return opener.open(url, data, timeout)
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
line 437, in open
response = meth(req, response)
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
line 550, in http_response
'http', request, response, code, msg, hdrs)
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
line 475, in error
return self._call_chain(*args)
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
line 409, in _call_chain
result = func(*args)
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
line 558, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 500: Internal Server Error
I am running 15.10 and Python 2.7.10 on my iMac for the server and the cluster
submission node is running Pulsar 0.5.0 and Python 2.7.12
For these tests I run Pulsar in an interactive window so I have not set the
sudoers file up, but rather enter sudo password when requested by Pulsar (at
the first step of chowning the staging directory). Also have rewrites set up in
Galaxy’s pulsar_actions.yml and I am using remote_scp for the file transfers
rather than http - although I have also tried switching back to http (as I
noticed caching, which I am also testing, does not work with scp transfers) but
get an identical set of error messages.
As I say, I have no troubles using a regular queued_drmaa manager in pulsar.
Any ideas what the problem may be?
Cheers,
Rich
Richard J Poole PhD
Wellcome Trust Fellow
Department of Cell and Developmental Biology
University College London
518 Rockefeller
21 University Street, London WC1E 6DE
Office (518 Rockefeller): +44 20 7679 6577 (int. 46577)
Lab (529 Rockefeller): +44 20 7679 6133 (int. 46133)
https://www.ucl.ac.uk/cdb/academics/poole
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/