The BLAST+ binaries support multi-threaded operation, which is handled
via the $GALAXY_SLOTS environment variable. This should be set
automatically by Galaxy via your job runner settings, which allows you to
(for example) allocate four cores to each BLAST job.

In addition, the BLAST+ wrappers also support high level parallelism
by task splitting if "use_tasked_jobs = True" is enabled in your
"universe_wsgi.ini" configuration file. Essentially, the FASTA input
query files are broken up into batches of 1000 sequences, a separate
BLAST child job is run for each chunk, and then the BLAST output
files are merged (in order). This is transparent for the end user.

Each tool enables this via their XML file, e.g.

<parallelism method="multi" split_inputs="query" split_mode="to_size"
split_size="1000" merge_outputs="output1"></parallelism>

This requires splitting support in the FASTA input datatypes, and
merging support in the selected output datatype (e.g. BLAST XML,
tabular, etc). This is done by methods in the Python datatype classes.

It would be interesting to see if any of John's work on collections
of files of the same type might fit nicely with this approach (and
thus avoid the disk IO overhead of the merge step?).

Peter


On Mon, Feb 10, 2014 at 1:56 AM, Ketan Maheshwari
<ketancmaheshw...@gmail.com> wrote:
> Thanks Dannon for the reference. I checked out the tool and installed from
> toolshed on my local Galaxy instance. I also checked out the related paper
> which refers that the Blast executables run in parallel by partitioning the
> input files into fragments and running batches in parallel. That sounds
> cool. I browsed the code but could not find the exact mechanism. Is the
> parallelism at workflow level aka branch parallelism or is it at the tool
> level that is the tool invokes parallel code?
>
> Thanks,
> Ketan
>
>
> On Sun, Feb 9, 2014 at 7:50 PM, Ketan Maheshwari <ke...@mcs.anl.gov> wrote:
>>
>> Thanks Dannon for the reference. I checked out the tool and installed from
>> toolshed on my local Galaxy instance. I also checked out the related paper
>> which refers that the Blast executables run in parallel by partitioning the
>> input files into fragments and running batches in parallel. That sounds
>> cool. I browsed the code but could not find the exact mechanism. Is the
>> parallelism at workflow level aka branch parallelism or is it at the tool
>> level that is the tool invokes parallel code?
>>
>> Thanks,
>> Ketan
>>
>>
>> On Thu, Feb 6, 2014 at 9:42 AM, Dannon Baker <dannon.ba...@gmail.com>
>> wrote:
>>>
>>> Ketan,
>>>
>>> Have you taken a look at galaxy's built-in parallelism framework?  For a
>>> great current example of a tool using this, look at Peter's NCBI BLAST+
>>> wrappers.  https://github.com/peterjc/galaxy_blast
>>>
>>> -Dannon
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to