+1. This is especially useful for us, with hardware-accelerated algorithms having limits on input size.
On 12-05-03 9:51 AM, "Peter Cock" <p.j.a.c...@googlemail.com> wrote: >Hello all, > >Currently the Galaxy experimental task splitting code allows splitting >into >N chunks, e.g. 8 parts, with: > ><parallelism method="multi" split_mode="number_of_parts" split_size="8" >..." /> > >Or, into chunks of at most size N (units dependent on the file type, e.g. >lines >in a tabular file or number of sequences in FASTA/FASTQ), e.g. at most >1000 >sequences: > ><parallelism method="multi" split_mode="to_size" split_size="1000" ... /> > >As an aside I found it confusing that the meaning of the "split_size" >attribute >depend on the "split_mode" (number of jobs, or size of jobs). > >I would prefer to be able to set both sizes - in this case tell Galaxy to >try >to use at least 8 parts, each of at most 1000 sequences. Thus in a BLAST >task, initially the split would be (up to) eight ways: > >8 queries => 8 jobs each with 1 query >80 queries => 8 jobs each with 10 queries >800 queries => 8 jobs each with 100 queries >8000 queries => 8 jobs each with 1000 queries > >Then, once the max chunk size comes into play, you'd just get more jobs: > >9000 queries => 9 jobs each with 1000 queries >10000 queries => 10 jobs each with 1000 queries >20000 queries => 20 jobs each with 1000 queries >etc > >The appeal of this is it takes advantage of parallelism for small jobs >(under 100 queries) and large jobs (1000s of queries), while able to >impose a maximum size on each cluster job. > >The problem is this requires changing the XML tags, and getting rid >of the current two modes in favour of this combined one. Perhaps this: > ><parallelism method="multi" min_jobs="8" max_size="1000" ... /> > >The jobs threshold isn't strictly a minimum - if you have N < 8 query >sequences, you'd just have N jobs of 1 query each. > >Does this sound sufficiently general? The split code is still rather >experimental so I don't expect breaking the API to be a big issue >(not many people are using it). > >Peter >___________________________________________________________ >Please keep all replies on the list by using "reply all" >in your mail client. To manage your subscriptions to this >and other Galaxy lists, please use the interface at: > > http://lists.bx.psu.edu/ > > ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/