Re: [galaxy-dev] Parallelism (job splitting) for ncbi_blast_plus running through CloudMan

David Kovalic Tue, 03 May 2016 13:27:06 -0700

Peter,

We made the modification to the config file, restarted galaxy and things
seem to be working from the galaxy end. We see sub-job directories being
created in /mnt/galaxy/tmp/job_working_directory. We think all of the
required job chunks have been created (i.e. total sequences/1000 sub-job
directories now with no more being created now)


Now we have what may be a CloudMan question: our working cluster has a head
node and 4 workers. The head node is loaded up but the workers are idle. I
would have thought jobs should be pushing out to the workers but we don't
see any load on these machines.

Any advice? Thanks.

David

PS. what is the path of the file which contains the split_size="1000"
configuration?



On Tue, May 3, 2016 at 2:19 PM David Kovalic <[email protected]> wrote:

> Peter,
>
> Thanks, I didn't see that, I was reading the paper and searching online.
>
> Appreciate the help, we'll give it a go!
>
> David
>
>
> On Tue, May 3, 2016 at 2:16 PM Peter Cock <[email protected]>
> wrote:
>
>> Hi David,
>>
>> The NCBI BLAST+ wrappers have a <parallelism> tag setup,
>> which becomes active if you have use_tasked_jobs = True in
>> your config/galaxy.ini file (aka universe_wsgi.ini).
>>
>> Specifically, the wrappers use this:
>>
>> <!-- If job splitting is enabled, break up the query file into parts -->
>> <parallelism method="multi" split_inputs="query" split_mode="to_size"
>> split_size="1000" merge_outputs="output1" />
>>
>> This is hard coded to break up the query FASTA file into batches
>> of 1000 sequences (e.g. a transcriptome of 20k genes becomes
>> 20 jobs), which has worked nicely on our cluster.
>>
>> Separately, each job uses -num_threads "\${GALAXY_SLOTS:-8}"
>> in the command line string, i.e. uses the $GALAXY_SLOTS
>> environment variable (set via the Galaxy job configuration), or
>> if not set, defaults to using 8 threads.
>>
>> I've essentially rephrased the README file here - did you see
>> that, or does it need more information added?
>>
>> Thanks,
>>
>> Peter
>>
>>
>> On Tue, May 3, 2016 at 6:58 PM, David Kovalic <[email protected]>
>> wrote:
>> > Hello,
>> >
>> > We would like to split fasta query files and run multiple concurrent
>> jobs to
>> > minimize our processing wall clock time for large jobs.
>> >
>> > After chatting with folks at GCC 2015 I understand this is possible, my
>> > problem is I cant find instructions on hos to configure
>> > CloudMan/ncbi_blast_plus to do this. For those of you who know me it
>> > probably goes without saying that I can't figure it out myself ;)
>> >
>> > Peter/Enis/others, can you help us out with this question?
>> >
>> > Thanks,
>> >
>> > David
>> >
>> >
>> > ___________________________________________________________
>> > Please keep all replies on the list by using "reply all"
>> > in your mail client.  To manage your subscriptions to this
>> > and other Galaxy lists, please use the interface at:
>> >   https://lists.galaxyproject.org/
>> >
>> > To search Galaxy mailing lists use the unified search at:
>> >   http://galaxyproject.org/search/mailinglists/
>>
>

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Parallelism (job splitting) for ncbi_blast_plus running through CloudMan

Reply via email to