On Thu, Aug 8, 2013 at 2:38 AM, John Chilton <chil...@msi.umn.edu> wrote:
> On Wed, Aug 7, 2013 at 3:33 PM, Ganote, Carrie L <cgan...@iu.edu> wrote:
>> Hi John,
>> That was it. I feel silly. I still have a lot of tooth-cutting to do on 
>> python!
>> I saw the parallel tags in the Blast tool and was very intrigued,
>> but couldn't find reference to it in the read-the-docs or on the
>> Galaxy wiki. Perhaps there is some documentation of this that I missed?
> No I don't think there is documentation unless you count the code base
> the mailing list archive. I think setting use_tasked_jobs to True in
> universe_wsgi.ini might be all you need to do to start splitting such
> blast inputs. I think the parallelism tag in the tool file describes
> how to split the inputs.

Yes, two basic forms - into chunks of a set size (which is what
the BLAST tools and my other wrappers, use for FASTA files
this is a given number of sequences) or into a target number
of parts.

> # This enables splitting of jobs into tasks, if specified by the
> particular tool config.
> # This is a new feature and not recommended for production servers yet.
> #use_tasked_jobs = False
> I don't use this functionality (at least not in this fashion) so I
> don't have a lot of advice. Otherwise, if you have a AMQP thing
> working you should probably just stick with that sounds like a
> perfectly good way to go.
> -John

>> In our case, the python splitting program is doing this:
>> * Take the blast query
>> * Split the sequences up
>> * For each sequence, submit the query and the command
>>   to a queue on a RabbitMQ server (Consumers are set up
>>   to listen for queries and then run the jobs).
>> * Write each result to a temp file
>> * When all of the sequence jobs are finished, concat the
>>   files back in the correct order and write to the output file
>>   Galaxy expects

That's pretty much what the BLAST+ wrappers do already
via Galaxy's parallel / task splitting. When your cluster is
not under full load, this gives faster processing for individual
jobs. The downside is more IO, making the cluster as a
whole less productive (if it was normally under high usage).

We use use_tasked_jobs = True on our Galaxy instance.

>> I made a wrapper for this splitter and it works fine on its own.
>> Now I'm trying to add this functionality (run on AMQP) as a
>> user-available option on the Blast tool. So for my dynamic
>> runner, I need to know whether to send the job to DRMAA
>> or to this AMQP python script. Hopefully that makes more
>> sense...

If using the parallel / task splitting as it is doesn't work, I
would suggest trying to re-use the Galaxy datatype definition
classes and their split/merge methods (which in the case of
many formats is non-trivial). For instance, merging XML files
needs a bit more care, and this work is done for BLAST XML.

But ideally could you integrate AMQP as an alternative
cluster backend which can be called instead of DRMAA etc?


Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

To search Galaxy mailing lists use the unified search at:

Reply via email to