Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

Fields, Christopher J Thu, 16 Feb 2012 10:43:03 -0800

On Feb 16, 2012, at 12:24 PM, Peter Cock wrote:

> On Thu, Feb 16, 2012 at 4:28 PM, Peter Cock <[email protected]> wrote:
>> Hi Dan,
>> 
>> I think I need a little more advice - what is the role of the script
>> scripts/extract_dataset_part.py and the JSON files created
>> when splitting FASTQ files in lib/galaxy/datatypes/sequence.py,
>> and then used by the class' process_split_file method?
>> 
>> Why is there no JSON file created by the base data class in
>> lib/galaxy/datatypes/data.py and no method process_split_file?
>> 
>> Is the JSON thing part of a partial and unfinished rewrite of the
>> splitter code?
>> 
>> On the assumption that not all splitters bother with the JSON,
>> I am trying a little hack to scripts/extract_dataset_part.py to
>> abort silently if there is no JSON file:
>> https://bitbucket.org/peterjc/galaxy-central/changeset/ebe94a2c25c3
>> 
>> This seems to be working with my current attempt at a FASTA
>> splitter (not checked in yes, only partly implemented and tested).
> 
> I've checked in my FASTA splitting, which now seems to be
> working OK with my BLAST tests. So far this only does splitting
> into chunks of the requested number of sequences, rather than
> the option to split the whole file into a given number of pieces.
> https://bitbucket.org/peterjc/galaxy-central/changeset/416c961c0da9


Cool!  Seems like a perfectly fine start.  I guess you could grab the # of 
sequences from the dataset somehow (I'm guessing that is set somehow upon 
import into Galaxy).

> I also need to look at merging multiple BLAST XML outputs, but
> this is looking promising.
> 
> Peter

Yep, that's definitely one where a simple concatenation wouldn't work (though 
NCBI used to think so, years ago…)

chris
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

Reply via email to