Hi Starr,

I'd suggest setting 'retry_job_output_collection' in universe_wsgi.ini
to some value > 0 (e.g. 5).  You may also want to try remounting the
filesystem on which your working directories are located with the
`-noac` temporarily just to rule out that the problem is related to
attribute caching.

The -noac option is not a good idea to use in production due to the
performance penalty of disabling it, but it'd be useful for debugging
the problem.

--nate

On Fri, Dec 6, 2013 at 5:47 PM, Hazard, E. Starr <haza...@musc.edu> wrote:
> This is a stand alone instance of Galaxy built Nov 28 a small research
> cluster using LSF and built on RHEL6.2.
>
>
> galaxy.tools.actions.upload_common INFO 2013-12-05 14:33:01,106 tool
> upload1 created job id 10
>  Working directory for job is:
> /depot/shared/app/Galaxy/galaxy-dist/database/job_working_directory/000/10
> galaxy.jobs.handler DEBUG 2013-12-05 14:33:01,290 (10) Dispatching to
> drmaa runner
> galaxy.jobs DEBUG 2013-12-05 14:33:01,441 (10) Persisting job destination
> (destination id: drmaa)
> galaxy.jobs.handler INFO 2013-12-05 14:33:01,471 (10) Job dispatched
> galaxy.tools.deps DEBUG 2013-12-05 14:33:01,693 Building dependency shell
> command for dependency 'samtools'
> galaxy.tools.deps WARNING 2013-12-05 14:33:01,693 Failed to resolve
> dependency on 'samtools', ignoring
> galaxy.jobs.runners.drmaa DEBUG 2013-12-05 14:33:02,352 (10) submitting
> file
> /depot/shared/app/Galaxy/galaxy-dist/database/job_working_directory/000/10/
> galaxy_10.sh
> galaxy.jobs.runners.drmaa DEBUG 2013-12-05 14:33:02,352 (10) command is:
> pythonŠ.
> galaxy.jobs DEBUG 2013-12-05 14:33:02,379 (10) Changing ownership of
> working directory with: /usr/bin/sudo -E scripts/external_chown_script.py
> /depot/shared/app/Galaxy/galaxy-dist/database/job_working_directory/000/10
> galaxy 50982
> galaxy.jobs.runners.drmaa DEBUG 2013-12-05 14:33:02,528 (10) submitting
> with credentials: galaxy [uid: 50981]
> galaxy.jobs.runners.drmaa DEBUG 2013-12-05 14:33:02,574 (10) Job script
> for external submission is:
> /depot/shared/app/Galaxy/galaxy-dist/database/lsf/10.jt_json
> galaxy.jobs.runners.drmaa INFO 2013-12-05 14:33:02,772 (10) queued as Job
> <28526> is submitted to default queue <medium_priority>.
> 28526
> galaxy.jobs DEBUG 2013-12-05 14:33:02,837 (10) Persisting job destination
> (destination id: drmaa)
> galaxy.jobs.runners.drmaa INFO 2013-12-05 14:33:03,237 (10/Job <28526> is
> submitted to default queue <medium_priority>.
> 28526) job left DRM queue with following message: code 18: invalid LSF job
> id: Job <28526> is submitted to default queue <medium_priority>.
> 28526
> galaxy.jobs DEBUG 2013-12-05 14:33:03,303 (10) Changing ownership of
> working directory with: /usr/bin/sudo -E scripts/external_chown_script.py
> /depot/shared/app/Galaxy/galaxy-dist/database/job_working_directory/000/10
> galaxy 50982
> 128.23.163.166 - - [05/Dec/2013:14:33:05 -0400] "GET
> /api/histories/50a7a2e81473b416/contents HTTP/1.1" 200 -
> "http://hpcc3.musc.edu:8089/root"; "Mozilla/5.0 (Macintosh; Intel Mac OS X
> 10.9; rv:25.0) Gecko/20100101 Firefox/25.0"
> galaxy.jobs.runners ERROR 2013-12-05 14:33:08,661 (10/Job <28526> is
> submitted to default queue <medium_priority>.
> 28526) Job output not returned from cluster: [Errno 2] No such file or
> directory:
> '/depot/shared/app/Galaxy/galaxy-dist/database/job_working_directory/000/10
> /galaxy_10.o'
> galaxy.jobs DEBUG 2013-12-05 14:33:08,701 (10) Changing ownership of
> working directory with: /usr/bin/sudo -E scripts/external_chown_script.py
> /depot/shared/app/Galaxy/galaxy-dist/database/job_working_directory/000/10
> galaxy 50982
> galaxy.jobs DEBUG 2013-12-05 14:33:09,049 finish(): Moved
> /depot/shared/app/Galaxy/galaxy-dist/database/job_working_directory/000/10/
> galaxy_dataset_16.dat to
> /depot/shared/app/Galaxy/galaxy-dist/database/files/000/dataset_16.dat
> galaxy.jobs DEBUG 2013-12-05 14:33:09,049 finish(): Moved
> /depot/shared/app/Galaxy/galaxy-dist/database/job_working_directory/000/10/
> galaxy_dataset_15.dat to
> /depot/shared/app/Galaxy/galaxy-dist/database/files/000/dataset_15.dat
> galaxy.jobs DEBUG 2013-12-05 14:33:09,105 setting dataset state to ERROR
> galaxy.jobs DEBUG 2013-12-05 14:33:09,126 setting dataset state to ERROR
> galaxy.jobs DEBUG 2013-12-05 14:33:09,214 job 10 ended
>
>
>
>
> "Job output not returned from cluster² appears in History, BUT in fact
> files are being written to directories indicated in  my universe_wsgi.ini
>
>
>
> I am having no success getting a solution to this error "job left DRM
> queue with following message: code 18: invalid LSF job id"
>
>
> This file "No such file or directory:
> '/depot/shared/app/Galaxy/galaxy-dist/database/job_working_directory/000/10
> /galaxy_10.o¹ ³ exists AFTER the job finishes. SO at 2013-12-05 14:33:03
> the file had not been written but does appear later.
>
> This operation is completing  before the file can be uploaded and so an
> empty file is moved
> Moved
> /depot/shared/app/Galaxy/galaxy-dist/database/job_working_directory/000/10/
> galaxy_dataset_15.dat to
> /depot/shared/app/Galaxy/galaxy-dist/database/files/000/dataset_15.dat
>
>
>
> >From my universe_wsgi.ini:
>
> file_path = database/files
> new_file_path = database/job_working_directory
> job_working_directory = database/job_working_directory
> cluster_files_directory = database/job_working_directory
>
>
>
>
>
>
>
> Any troubleshooting suggestions appreciated.
>
> Starr
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to