On Thu, Aug 8, 2013 at 2:38 AM, John Chilton <chil...@msi.umn.edu> wrote: > On Wed, Aug 7, 2013 at 3:33 PM, Ganote, Carrie L <cgan...@iu.edu> wrote: >> Hi John, >> >> That was it. I feel silly. I still have a lot of tooth-cutting to do on >> python! >> >> I saw the parallel tags in the Blast tool and was very intrigued, >> but couldn't find reference to it in the read-the-docs or on the >> Galaxy wiki. Perhaps there is some documentation of this that I missed? > > No I don't think there is documentation unless you count the code base > the mailing list archive. I think setting use_tasked_jobs to True in > universe_wsgi.ini might be all you need to do to start splitting such > blast inputs. I think the parallelism tag in the tool file describes > how to split the inputs.
Yes, two basic forms - into chunks of a set size (which is what the BLAST tools and my other wrappers, use for FASTA files this is a given number of sequences) or into a target number of parts. > # This enables splitting of jobs into tasks, if specified by the > particular tool config. > # This is a new feature and not recommended for production servers yet. > #use_tasked_jobs = False > > I don't use this functionality (at least not in this fashion) so I > don't have a lot of advice. Otherwise, if you have a AMQP thing > working you should probably just stick with that sounds like a > perfectly good way to go. > > -John >> In our case, the python splitting program is doing this: >> * Take the blast query >> * Split the sequences up >> * For each sequence, submit the query and the command >> to a queue on a RabbitMQ server (Consumers are set up >> to listen for queries and then run the jobs). >> * Write each result to a temp file >> * When all of the sequence jobs are finished, concat the >> files back in the correct order and write to the output file >> Galaxy expects That's pretty much what the BLAST+ wrappers do already via Galaxy's parallel / task splitting. When your cluster is not under full load, this gives faster processing for individual jobs. The downside is more IO, making the cluster as a whole less productive (if it was normally under high usage). We use use_tasked_jobs = True on our Galaxy instance. >> I made a wrapper for this splitter and it works fine on its own. >> Now I'm trying to add this functionality (run on AMQP) as a >> user-available option on the Blast tool. So for my dynamic >> runner, I need to know whether to send the job to DRMAA >> or to this AMQP python script. Hopefully that makes more >> sense... If using the parallel / task splitting as it is doesn't work, I would suggest trying to re-use the Galaxy datatype definition classes and their split/merge methods (which in the case of many formats is non-trivial). For instance, merging XML files needs a bit more care, and this work is done for BLAST XML. But ideally could you integrate AMQP as an alternative cluster backend which can be called instead of DRMAA etc? Regards, Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/