On Feb 8, 2012, at 11:58 AM, Ryan Golhar wrote:

> Hi Nate - I finally got a chance to look at this briefly, but I must admit, 
> my Python skills are lacking.  In the Bam class in binary.py, all I see are 
> calls to 
> 
> proc = subprocess.Popen( args=command, shell=True, cwd=tmp_dir, stderr=open( 
> stderr_name, 'wb' ) )
> 
> which, to me, look like calls to execute a command.  So maybe Galaxy is 
> running samtools on the webserver because of this?

This is indeed the place in the code where samtools called, but that code can 
be called from within the external metadata setting tool or from the job 
runner.  In your case, it's happening in the job runner despite having 
set_metadata_externally = True.  Could you check the conditionals in the 
earlier email I sent:

The relevant code is in galaxy-dist/lib/galaxy/datatypes/binary.py, in the Bam 
class.  When Galaxy runs a tool, it creates a Job, which is placed inside a 
JobWrapper in lib/galaxy/jobs/__init__.py.  After the job execution is 
complete, the JobWrapper.finish() method is called, which contains:

                   if not self.app.config.set_metadata_externally or \
                    ( not 
self.external_output_metadata.external_metadata_set_successfully( dataset, 
self.sa_session ) \
                      and self.app.config.retry_metadata_internally ):
                       dataset.set_meta( overwrite = False )

Somehow, this conditional is being entered.  Since set_metadata_externally is 
set to True, presumably the problem is external_metadata_set_successfully() is 
returning False and retry_metadata_internally is set to True.  If you leave 
behind the relevant job files (cleanup_job = never) and have a look at the PBS 
and metadata outputs you may be able to see what's happening.  Also, you'll 
want to set retry_metadata_internally = False.

Namely, try adding the following right above that conditional:

log.debug('#### %s: %s' % (type(self.app.config.set_metadata_externally), 
self.app.config.set_metadata_externally))
log.debug('#### %s: %s' % 
(type(self.external_output_metadata.external_metadata_set_successfully( 
dataset, self.sa_session ), 
self.external_output_metadata.external_metadata_set_successfully( dataset, 
self.sa_session )))
log.debug('#### %s: %s' % (type(self.app.config.retry_metadata_internally), 
self.app.config.retry_metadata_internally))

I am guessing self.external_output_metadata.external_metadata_set_successfully( 
dataset, self.sa_session ) is returning False, and 
self.app.config.retry_metadata_internally is True, so then we'd need to 
determine why external metadata is failing for this job.

--nate

> 
> 
> On Fri, Jan 20, 2012 at 11:43 AM, Shantanu Pavgi <pa...@uab.edu> wrote:
> 
> Just wanted to add that we have consistently seen this issue of 'samtools 
> index' running locally on our install. We are using SGE scheduler. Thanks for 
> pointing out details in the code Nate.
> 
> --
> Shantanu.
> 
> 
> 
> On Jan 20, 2012, at 9:35 AM, Nate Coraor wrote:
> 
> > On Jan 18, 2012, at 11:54 AM, Ryan Golhar wrote:
> >
> >> Nate - Is there a specific place in the Galaxy code that forks the 
> >> samtools index on bam files on the cluster or the head node?  I really 
> >> need to track this down.
> >
> > Hey Ryan,
> >
> > Sorry it's taken so long, I've been pretty busy.  The relevant code is in 
> > galaxy-dist/lib/galaxy/datatypes/binary.py, in the Bam class.  When Galaxy 
> > runs a tool, it creates a Job, which is placed inside a JobWrapper in 
> > lib/galaxy/jobs/__init__.py.  After the job execution is complete, the 
> > JobWrapper.finish() method is called, which contains:
> >
> >                    if not self.app.config.set_metadata_externally or \
> >                     ( not 
> > self.external_output_metadata.external_metadata_set_successfully( dataset, 
> > self.sa_session ) \
> >                       and self.app.config.retry_metadata_internally ):
> >                        dataset.set_meta( overwrite = False )
> >
> > Somehow, this conditional is being entered.  Since set_metadata_externally 
> > is set to True, presumably the problem is 
> > external_metadata_set_successfully() is returning False and 
> > retry_metadata_internally is set to True.  If you leave behind the relevant 
> > job files (cleanup_job = never) and have a look at the PBS and metadata 
> > outputs you may be able to see what's happening.  Also, you'll want to set 
> > retry_metadata_internally = False.
> >
> > --nate
> >
> >>
> >> On Fri, Jan 13, 2012 at 12:54 PM, Ryan Golhar 
> >> <ngsbioinformat...@gmail.com> wrote:
> >> I re-uploaded 3 BAM files using the "Upload system file paths.  
> >> runner0.log shows:
> >>
> >> galaxy.jobs DEBUG 2012-01-13 12:50:08,442 dispatching job 76 to pbs runner
> >> galaxy.jobs INFO 2012-01-13 12:50:08,555 job 76 dispatched
> >> galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) submitting file 
> >> /home/galaxy/galaxy-dist-9/database/pbs/76.sh
> >> galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,697 (76) command is: 
> >> python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py 
> >> /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml 
> >> /home/galaxy/galaxy-dist-9/database/tmp/tmpqrVYY7         
> >> 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None
> >>          
> >> 209:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None
> >>          
> >> 210:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_210_files:None;
> >>  cd /home/galaxy/galaxy-dist-9; /home/galaxy/galaxy-dist-9/set_metadata.sh 
> >> ./database/files ./database/tmp . datatypes_conf.xml 
> >> ./database/job_working_directory/76/galaxy.json
> >> galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:08,699 (76) queued in 
> >> default queue as 114.localhost.localdomain
> >> galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:50:09,037 
> >> (76/114.localhost.localdomain) PBS job state changed from N to R
> >> galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:09,205 
> >> (76/114.localhost.localdomain) PBS job state changed from R to E
> >> galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 
> >> (76/114.localhost.localdomain) PBS job state changed from E to C
> >> galaxy.jobs.runners.pbs DEBUG 2012-01-13 12:51:10,206 
> >> (76/114.localhost.localdomain) PBS job has completed successfully
> >>
> >> 76.sh shows:
> >> [galaxy@bic pbs]$ more 76.sh
> >> #!/bin/sh
> >> GALAXY_LIB="/home/galaxy/galaxy-dist-9/lib"
> >> if [ "$GALAXY_LIB" != "None" ]; then
> >>    if [ -n "$PYTHONPATH" ]; then
> >>        export PYTHONPATH="$GALAXY_LIB:$PYTHONPATH"
> >>    else
> >>        export PYTHONPATH="$GALAXY_LIB"
> >>    fi
> >> fi
> >> cd /home/galaxy/galaxy-dist-9/database/job_working_directory/76
> >> python /home/galaxy/galaxy-dist-9/tools/data_source/upload.py 
> >> /home/galaxy/galaxy-dist-9 /home/galaxy/galaxy-dist-9/datatypes_conf.xml 
> >> /home/galaxy
> >> /galaxy-dist-9/database/tmp/tmpqrVYY7         
> >> 208:/home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_208_files:None
> >>          209:/
> >> home/galaxy/galaxy-dist-9/database/job_working_directory/76/dataset_209_files:None
> >>          210:/home/galaxy/galaxy-dist-9/database/job_working_dire
> >> ctory/76/dataset_210_files:None; cd /home/galaxy/galaxy-dist-9; 
> >> /home/galaxy/galaxy-dist-9/set_metadata.sh ./database/files ./database/tmp 
> >> . dataty
> >> pes_conf.xml ./database/job_working_directory/76/galaxy.json
> >>
> >> Right as the job ended I check the job output files:
> >>
> >> [galaxy@bic pbs]$ ll
> >> total 4
> >> -rw-rw-r-- 1 galaxy galaxy 950 Jan 13 12:50 76.sh
> >> [galaxy@bic pbs]$ ll
> >> total 4
> >> -rw------- 1 galaxy galaxy   0 Jan 13 12:50 76.e
> >> -rw------- 1 galaxy galaxy   0 Jan 13 12:50 76.o
> >> -rw-rw-r-- 1 galaxy galaxy 950 Jan 13 12:50 76.sh
> >>
> >> samtools is now running on the head node.
> >>
> >>
> >> Where does Galaxy decide how to run samtools?  Maybe I can add a check of 
> >> some sort to see what's going on?
> >>
> >>
> >> On Fri, Jan 13, 2012 at 10:53 AM, Nate Coraor <n...@bx.psu.edu> wrote:
> >> On Jan 12, 2012, at 11:41 PM, Ryan Golhar wrote:
> >>
> >>> Any ideas as to how to fix this?  We are interested in using Galaxy to 
> >>> host all our NGS data.  If indexing on the head node is going to happen, 
> >>> then this is going to be an extremely slow process.
> >>
> >> Could you post the contents of 
> >> /home/galaxy/galaxy-dist-9/database/pbs/62.sh ?
> >>
> >> Although I have to admit this is really baffling.  The presence of this 
> >> line without an error:
> >>
> >>   galaxy.datatypes.metadata DEBUG 2012-01-11 10:22:40,162 Cleaning up 
> >> external metadata files
> >>
> >> Indicates that metadata was set externally and the relevant metadata files 
> >> were present on disk.
> >>
> >> --nate
> >>
> >>
> >>
> >
> >
> > ___________________________________________________________
> > Please keep all replies on the list by using "reply all"
> > in your mail client.  To manage your subscriptions to this
> > and other Galaxy lists, please use the interface at:
> >
> >  http://lists.bx.psu.edu/
> 
> 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to