Glad this almost worked - I'm not sure what the problem is. I'd open
the file /cluster/galaxy/pulsar/pulsar/managers/util/drmaa/__init__.py
- and add some logging right before this line - (return
self.session.jobStatus(str(external_job_id))).

log.info("Fetching job status for %s" % external_job_id)

or something like that. See if the ID matches something that was in
your queuing software. It might have some extra prefix or something
that we can strip off.

It would be also interesting to try Pulsar 0.7.3 against Galaxy 16.07
- this may be caused by a problem that has been fixed.

-John


On Thu, Oct 13, 2016 at 2:06 PM, Poole, Richard <r.po...@ucl.ac.uk> wrote:
> Hey John,
>
> So I’ve been happily using Pulsar to send all my Galaxy server jobs to our
> cluster here at UCL for several months now (I love it!). I am now exploring
> the ‘run-as-real-user’ option for DRMAA submissions and have run into a
> problem. The files are correctly staged, correctly chowned, successfully
> submitted to the queue and the job runs. However, at job end (collection?)
> fails with the following error message in Pulsar:
>
> Exception happened during processing of request from (‘*.*.*.*', 54321)
> Traceback (most recent call last):
>   File
> "/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py",
> line 1072, in process_request_in_thread
>     self.finish_request(request, client_address)
>   File "/opt/rocks/lib/python2.6/SocketServer.py", line 322, in
> finish_request
>     self.RequestHandlerClass(request, client_address, self)
>   File "/opt/rocks/lib/python2.6/SocketServer.py", line 617, in __init__
>     self.handle()
>   File
> "/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py",
> line 446, in handle
>     BaseHTTPRequestHandler.handle(self)
>   File "/opt/rocks/lib/python2.6/BaseHTTPServer.py", line 329, in handle
>     self.handle_one_request()
>   File
> "/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py",
> line 441, in handle_one_request
>     self.wsgi_execute()
>   File
> "/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py",
> line 291, in wsgi_execute
>     self.wsgi_start_response)
>   File "/cluster/galaxy/pulsar/pulsar/web/framework.py", line 39, in
> __call__
>     return controller(environ, start_response, **request_args)
>   File "/cluster/galaxy/pulsar/pulsar/web/framework.py", line 144, in
> controller_replacement
>     result = self.__execute_request(func, args, req, environ)
>   File "/cluster/galaxy/pulsar/pulsar/web/framework.py", line 124, in
> __execute_request
>     result = func(**args)
>   File "/cluster/galaxy/pulsar/pulsar/web/routes.py", line 82, in status
>     return status_dict(manager, job_id)
>   File "/cluster/galaxy/pulsar/pulsar/manager_endpoint_util.py", line 12, in
> status_dict
>     job_status = manager.get_status(job_id)
>   File "/cluster/galaxy/pulsar/pulsar/managers/stateful.py", line 95, in
> get_status
>     proxy_status, state_change = self.__proxy_status(job_directory, job_id)
>   File "/cluster/galaxy/pulsar/pulsar/managers/stateful.py", line 115, in
> __proxy_status
>     proxy_status = self._proxied_manager.get_status(job_id)
>   File
> "/cluster/galaxy/pulsar/pulsar/managers/queued_external_drmaa_original.py",
> line 62, in get_status
>     external_status = super(ExternalDrmaaQueueManager,
> self)._get_status_external(external_id)
>   File "/cluster/galaxy/pulsar/pulsar/managers/base/base_drmaa.py", line 31,
> in _get_status_external
>     drmaa_state = self.drmaa_session.job_status(external_id)
>   File "/cluster/galaxy/pulsar/pulsar/managers/util/drmaa/__init__.py", line
> 50, in job_status
>     return self.session.jobStatus(str(external_job_id))
>   File "build/bdist.linux-x86_64/egg/drmaa/session.py", line 518, in
> jobStatus
>     c(drmaa_job_ps, jobId, byref(status))
>   File "build/bdist.linux-x86_64/egg/drmaa/helpers.py", line 299, in c
>     return f(*(args + (error_buffer, sizeof(error_buffer))))
>   File "build/bdist.linux-x86_64/egg/drmaa/errors.py", line 151, in
> error_check
>     raise _ERRORS[code - 1](error_string)
> InvalidJobException: code 18: The job specified by the 'jobid' does not
> exist.
>
> With this corresponding error from my Galaxy server:
>
> galaxy.tools.actions INFO 2016-10-13 18:47:51,851 Handled output (279.421
> ms)
> galaxy.tools.actions INFO 2016-10-13 18:47:52,093 Verified access to
> datasets (5.271 ms)
> galaxy.tools.execute DEBUG 2016-10-13 18:47:52,118 Tool
> [toolshed.g2.bx.psu.edu/repos/devteam/sam_to_bam/sam_to_bam/1.1.4] created
> job [25008] (560.404 ms)
> galaxy.jobs DEBUG 2016-10-13 18:47:52,579 (25008) Working directory for job
> is: /Users/galaxy/galaxy-dist/database/job_working_directory/025/25008
> galaxy.jobs.handler DEBUG 2016-10-13 18:47:52,591 (25008) Dispatching to
> pulsar runner
> galaxy.jobs DEBUG 2016-10-13 18:47:52,677 (25008) Persisting job destination
> (destination id: hpc_low)
> galaxy.jobs.runners DEBUG 2016-10-13 18:47:52,681 Job [25008] queued (90.231
> ms)
> galaxy.jobs.handler INFO 2016-10-13 18:47:52,699 (25008) Job dispatched
> galaxy.tools.deps DEBUG 2016-10-13 18:47:53,138 Building dependency shell
> command for dependency 'samtools'
> galaxy.jobs.runners.pulsar INFO 2016-10-13 18:47:53,233 Pulsar job submitted
> with job_id 25008
> galaxy.jobs DEBUG 2016-10-13 18:47:53,257 (25008) Persisting job destination
> (destination id: hpc_low)
> galaxy.datatypes.metadata DEBUG 2016-10-13 18:51:03,922 Cleaning up external
> metadata files
> galaxy.jobs.runners.pulsar ERROR 2016-10-13 18:51:03,945 failure finishing
> job 25008
> Traceback (most recent call last):
>   File "/Users/galaxy/galaxy-dist/lib/galaxy/jobs/runners/pulsar.py", line
> 386, in finish_job
>     run_results = client.full_status()
>   File "/Users/galaxy/galaxy-dist/lib/pulsar/client/client.py", line 132, in
> full_status
>     return self.raw_check_complete()
>   File "/Users/galaxy/galaxy-dist/lib/pulsar/client/decorators.py", line 28,
> in replacement
>     return func(*args, **kwargs)
>   File "/Users/galaxy/galaxy-dist/lib/pulsar/client/decorators.py", line 13,
> in replacement
>     response = func(*args, **kwargs)
>   File "/Users/galaxy/galaxy-dist/lib/pulsar/client/client.py", line 146, in
> raw_check_complete
>     check_complete_response = self._raw_execute("status", {"job_id":
> self.job_id})
>   File "/Users/galaxy/galaxy-dist/lib/pulsar/client/client.py", line 215, in
> _raw_execute
>     return self.job_manager_interface.execute(command, args, data,
> input_path, output_path)
>   File "/Users/galaxy/galaxy-dist/lib/pulsar/client/interface.py", line 96,
> in execute
>     response = self.transport.execute(url, method=method, data=data,
> input_path=input_path, output_path=output_path)
>   File "/Users/galaxy/galaxy-dist/lib/pulsar/client/transport/standard.py",
> line 34, in execute
>     response = self._url_open(request, data)
>   File "/Users/galaxy/galaxy-dist/lib/pulsar/client/transport/standard.py",
> line 20, in _url_open
>     return urlopen(request, data)
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 154, in urlopen
>     return opener.open(url, data, timeout)
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 437, in open
>     response = meth(req, response)
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 550, in http_response
>     'http', request, response, code, msg, hdrs)
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 475, in error
>     return self._call_chain(*args)
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 409, in _call_chain
>     result = func(*args)
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 558, in http_error_default
>     raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
> HTTPError: HTTP Error 500: Internal Server Error
>
>
>
> I am running 15.10 and Python 2.7.10 on my iMac for the server and the
> cluster submission node is running Pulsar 0.5.0 and Python 2.7.12
>
> For these tests I run Pulsar in an interactive window so I have not set the
> sudoers file up, but rather enter sudo password when requested by Pulsar (at
> the first step of chowning the staging directory). Also have rewrites set up
> in Galaxy’s pulsar_actions.yml and I am using remote_scp for the file
> transfers rather than http - although I have also tried switching back to
> http (as I noticed caching, which I am also testing, does not work with scp
> transfers) but get an identical set of error messages.
>
> As I say, I have no troubles using a regular queued_drmaa manager in pulsar.
> Any ideas what the problem may be?
>
> Cheers,
> Rich
>
>
>
>
>
> Richard J Poole PhD
> Wellcome Trust Fellow
> Department of Cell and Developmental Biology
> University College London
> 518 Rockefeller
> 21 University Street, London WC1E 6DE
> Office (518 Rockefeller): +44 20 7679 6577 (int. 46577)
> Lab (529 Rockefeller): +44 20 7679 6133 (int. 46133)
> https://www.ucl.ac.uk/cdb/academics/poole
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to