Hello Dr. Krampis,

  At the present, the LWR is most valuable when there is not a shared
file system between the server executing the jobs and the server
hosting Galaxy. In this case you seem to have a shared filesystem so I
would think setting up something like sun grid engine to manage the
jobs and using the DRMAA job runner would be the best route forward.

  The LWR and the corresponding galaxy job runner will coordinate to
stage jobs, but the upshot is that the LWR should be the only thing
writing to its staging directory. In this case you have configured the
LWR and Galaxy to both use the same directory. You should change this
configuration immediately, I am worried the LWR is going to delete or
overwrite files maintained by Galaxy. I am sorry for the confusion, I
will update the documentation to explicitly warn against this.

  If you still feel there is a compelling reason to use the LWR in
this situation, you will just want to change the staging_directory in
the LWR configuration to something else. It has long been on my TODO
list to allow one to disable (or selectively disable by path
regex/globs) file staging with the LWR, it seems like that would what
would also help in your situation. Let me know if that is of interest
to you.

-John


On Mon, May 6, 2013 at 8:32 AM, Krampis, Konstantinos <kkram...@jcvi.org> wrote:
> Hi all,
>
>   I am trying to set up a Galaxy cluster using the LWR runner. The nodes have
> a shared filesystem and in universe.wsgi this parameter is set :
>
> job_working_directory = /mnt/shared
> ...
> clustalw = lwr://http://192.168.33.12:8913
> ....
>
> this folder has been "chown-ed" to the galaxy user, and also is "a+w",
> while it has been verified that can been read / written by ssh-ing to
> each node of the cluster. The sticky bit is set.
>
> When I try to run jobs (I used clustalw as example) there seems to be
> confusion between where Galaxy puts files and where LWR tries to read
> them from. Here are two setups that error out:
>
>
>
>
> 1). When in server.ini for LWR the following is set as:
> staging_directory = /mnt/shared/000
>
> galaxy error:
>
> galaxy.jobs DEBUG 2013-05-06 10:21:22,320 (128) Working directory for job is: 
> /mnt/shared/000/128
> galaxy.jobs.handler DEBUG 2013-05-06 10:21:22,320 dispatching job 128 to lwr 
> runner
> galaxy.jobs.handler INFO 2013-05-06 10:21:22,427 (128) Job dispatched
> galaxy.datatypes.metadata DEBUG 2013-05-06 10:21:22,875 Cleaning up external 
> metadata files
> galaxy.jobs.runners.lwr ERROR 2013-05-06 10:21:22,902 failure running job 128
>
>
> lwr error (on the cluster node):
>
>   File "/home/vagrant/jmchilton-lwr-5213f6dce32d/lwr/app.py", line 81, in 
> setup
>     manager.setup_job_directory(job_id)
>   File "/home/vagrant/jmchilton-lwr-5213f6dce32d/lwr/manager.py", line 101, 
> in setup_job_directory
>     os.mkdir(job_directory)
> OSError: [Errno 17] File exists: '/mnt/shared/000/128'
>
>
>
>
> 2). When in server.ini for LWR the following is set as:
> staging_directory = /mnt/shared
>
> galaxy error:
>
> galaxy.jobs DEBUG 2013-05-06 10:28:46,872 (129) Working directory for job is: 
> /mnt/shared/000/129
> galaxy.jobs.handler DEBUG 2013-05-06 10:28:46,872 dispatching job 129 to lwr 
> runner
> galaxy.jobs.handler INFO 2013-05-06 10:28:46,967 (129) Job dispatched
> 192.168.33.1 - - [06/May/2013:10:28:48 -0200] "GET 
> /api/histories/2a56795cad3c7db3 HTTP/1.1" 200 - 
> "http://192.168.33.11:8080/history"; "Mozilla/5.0 (Macintosh; Intel Mac OS X 
> 10_7_5) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.65 
> Safari/537.31"
> galaxy.jobs.runners.lwr DEBUG 2013-05-06 10:28:50,653 run_results {'status': 
> 'status', 'returncode': 0, 'complete': 'true', 'stderr': '', 'stdout': ''}
> galaxy.datatypes.metadata DEBUG 2013-05-06 10:28:50,970 Cleaning up external 
> metadata files
> galaxy.jobs.runners.lwr ERROR 2013-05-06 10:28:51,050 failure running job 129
>
>
> lwr error (on the cluster node):
>
>     resp.app_iter = FileIterator(result)
>   File "/home/vagrant/jmchilton-lwr-5213f6dce32d/lwr/framework.py", line 111, 
> in __init__
>     self.input = open(path, 'rb')
> IOError: [Errno 2] No such file or directory: 
> u'/mnt/shared/129/outputs/dataset_170.dat'
>
>
>
>
> The full error stacks are at the end of this email. It might be something 
> very simple that I am missing,
> but any feedback would be greatly appreciated. Thanks !
>
> Ntino
>
>
>
> --
> Konstantinos (Ntino) Krampis, Ph.D.
> Asst. Professor, Informatics
> J.Craig Venter Institute
>
> kkram...@jcvi.org
> agbio...@gmail.com
> +1-540-200-8277
>
> Web:
> http://bit.ly/cloud-research
> http://cloudbiolinux.org/
> http://twitter.com/agbiotec
>
>
>
> ---- GALAXY ERROR
>
> galaxy.jobs DEBUG 2013-05-06 10:21:22,320 (128) Working directory for job is: 
> /mnt/shared/000/128
> galaxy.jobs.handler DEBUG 2013-05-06 10:21:22,320 dispatching job 128 to lwr 
> runner
> galaxy.jobs.handler INFO 2013-05-06 10:21:22,427 (128) Job dispatched
> galaxy.datatypes.metadata DEBUG 2013-05-06 10:21:22,875 Cleaning up external 
> metadata files
> galaxy.jobs.runners.lwr ERROR 2013-05-06 10:21:22,902 failure running job 128
> Traceback (most recent call last):
>   File "/home/vagrant/galaxy-dist/lib/galaxy/jobs/runners/lwr.py", line 286, 
> in run_job
>     file_stager = FileStager(client, command_line, 
> job_wrapper.extra_filenames, input_files, output_files, 
> job_wrapper.tool.tool_dir)
>   File "/home/vagrant/galaxy-dist/lib/galaxy/jobs/runners/lwr.py", line 40, 
> in __init__
>     job_config = client.setup()
>   File "/home/vagrant/galaxy-dist/lib/galaxy/jobs/runners/lwr.py", line 212, 
> in setup
>     return self.__raw_execute_and_parse("setup", { "job_id" : self.job_id })
>   File "/home/vagrant/galaxy-dist/lib/galaxy/jobs/runners/lwr.py", line 150, 
> in __raw_execute_and_parse
>     response = self.__raw_execute(command, args, data)
>   File "/home/vagrant/galaxy-dist/lib/galaxy/jobs/runners/lwr.py", line 146, 
> in __raw_execute
>     response = self.url_open(request, data)
>   File "/home/vagrant/galaxy-dist/lib/galaxy/jobs/runners/lwr.py", line 134, 
> in url_open
>     return urllib2.urlopen(request, data)
>   File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
>     return _opener.open(url, data, timeout)
>   File "/usr/lib/python2.7/urllib2.py", line 406, in open
>     response = meth(req, response)
>   File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
>     'http', request, response, code, msg, hdrs)
>   File "/usr/lib/python2.7/urllib2.py", line 444, in error
>     return self._call_chain(*args)
>   File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
>     result = func(*args)
>   File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
>     raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
> HTTPError: HTTP Error 500: Internal Server Error
>
>
>
> ---- LWR ERROR
>
> Exception happened during processing of request from ('192.168.33.11', 44802)
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/Paste-1.7.5.1-py2.7.egg/paste/httpserver.py",
>  line 1068, in process_request_in_thread
>     self.finish_request(request, client_address)
>   File "/usr/lib/python2.7/SocketServer.py", line 323, in finish_request
>     self.RequestHandlerClass(request, client_address, self)
>   File "/usr/lib/python2.7/SocketServer.py", line 638, in __init__
>     self.handle()
>   File 
> "/usr/local/lib/python2.7/dist-packages/Paste-1.7.5.1-py2.7.egg/paste/httpserver.py",
>  line 442, in handle
>     BaseHTTPRequestHandler.handle(self)
>   File "/usr/lib/python2.7/BaseHTTPServer.py", line 340, in handle
>     self.handle_one_request()
>   File 
> "/usr/local/lib/python2.7/dist-packages/Paste-1.7.5.1-py2.7.egg/paste/httpserver.py",
>  line 437, in handle_one_request
>     self.wsgi_execute()
>   File 
> "/usr/local/lib/python2.7/dist-packages/Paste-1.7.5.1-py2.7.egg/paste/httpserver.py",
>  line 287, in wsgi_execute
>     self.wsgi_start_response)
>   File "/home/vagrant/jmchilton-lwr-5213f6dce32d/lwr/framework.py", line 35, 
> in __call__
>     return controller(environ, start_response, **request_args)
>   File "/home/vagrant/jmchilton-lwr-5213f6dce32d/lwr/framework.py", line 90, 
> in controller_replacement
>     result = func(**args)
>   File "/home/vagrant/jmchilton-lwr-5213f6dce32d/lwr/app.py", line 81, in 
> setup
>     manager.setup_job_directory(job_id)
>   File "/home/vagrant/jmchilton-lwr-5213f6dce32d/lwr/manager.py", line 101, 
> in setup_job_directory
>     os.mkdir(job_directory)
> OSError: [Errno 17] File exists: '/mnt/shared/000/128'
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to