It looks like if I set 'retry_metadata_internally = False' it stops trying
to index them on the queue node. The datasets get added into the library,
without a BAM index file, but without error.
I guess the index files can be generated on demand later on.
On Mon, Jan 28, 2013 at 12:42 PM, Greg Von Kuster <g...@bx.psu.edu> wrote:
> Hi Kyle,
> I'm hoping I can help you a bit on this, although i am not very familiar
> with the code that is producing this behavior. Your previous reply
> mentions the following:
> During job cleanup,
> galaxy.jobs.__init__.py:412, because
> external_metadata_set_successfully returns false.
> An external set_metadata.sh job was run, but it doesn't seem to call
> samtools. Maybe if I figure out why set_metadata.sh isn't working,
> this problem will go away.
> Based on your comments, there are a few things you can do:
> 1. If setting external metadata results in an error, the error should be
> printed out in your paster log. Do you see anything relevant there?
> 2. You also may be able to discover the error if you perform the following
> sql manually - make sure your have the correct job_id:
> select filename_results_code from job_external_output_metadata where
> job_id = <job_id>;
> 3. Make sure you have the following config setting uncommented and set to
> False in your universe_wsgi.ini (the default is set to True):
> # Although it is fairly reliable, setting metadata can occasionally fail.
> # these instances, you can choose to retry setting it internally or leave
> it in
> # a failed state (since retrying internally may cause the Galaxy process
> to be
> # unresponsive). If this option is set to False, the user will be given
> # option to retry externally, or set metadata manually (when possible).
> retry_metadata_internally = False
> Let me know if any of this helps you resolve the problem, and if not,
> we'll figure out next steps if possible.
> Greg Von Kuster
> On Jan 24, 2013, at 4:36 PM, Kyle Ellrott wrote:
> I'm willing to put in the coding time, but I'd need some pointers on the
> best way to go about making the changes.
> On Wed, Jan 23, 2013 at 6:35 PM, Anthonius deBoer <thondeb...@me.com>wrote:
>> I also second this request to get it addressed (Where can we vote on bug
>> fixes ?! :) ...It is very weird that samtools is run on the local machine
>> and it even does the indexing sequentially...
>> On Jan 23, 2013, at 03:28 PM, Kyle Ellrott <kellr...@soe.ucsc.edu> wrote:
>> I'm currently in the process of loading (path paste) a large library of
>> BAM files (>10000) into the shared Data Libraries of our local galaxy
>> installation, but I'm finding this process to be very slow.
>> I'm doing a path paste, and not actually copying the files. I have
>> disabled local running of 'upload1', so that it will run on the cluster,
>> and set 'set_metadata_externally' to true.
>> It looks like the job handlers are calling 'samtools index' directly.
>> Looking through the code, that seems to happen in galaxy/datatypes/binary
>> in Bam.dataset_content_needs_grooming, where it calls 'samtools index' and
>> then waits.
>> What would be the most efficient way to start changing the code so that
>> this process can be done by an external script, at a deferred time out on
>> the cluster?
>> Please keep all replies on the list by using "reply all"
>> in your mail client. To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at: