Yes, blat is very good at mRNA alignment.

----- Original Message -----
From: "Peng Yu" <[email protected]>
To: "Hiram Clawson" <[email protected]>
Cc: [email protected]
Sent: Tuesday, June 15, 2010 6:01:21 PM GMT -08:00 Tijuana / Baja California
Subject: Re: [Genome] parallel blat

I know there are other much faster tools for ungapped or small-gap
alignment. But I think that blat is still the best one for aligning
mRNAs to the genome, which may have very large gaps. Am I correct on
this?

On Tue, Jun 15, 2010 at 7:53 PM, Hiram Clawson <[email protected]> wrote:
> Absolutely.  Break your target genome up into several hundred
> overlapping pieces.  On the order of 5 to 10 million bases, or
> even smaller.  Partition your 10 million short sequences into several hundred
> multiple record fasta files.  Run a job for each target genome chunk
> against each query fasta file.  These are all separate processes.
>
> Please note, blat is not necessarily the best tool for short sequence
> alignment.  There are other much better tools for short sequence
> alignment.  See also:
>
> http://en.wikipedia.org/wiki/List_of_sequence_alignment_software
>
> --Hiram
>
> ----- Original Message -----
> From: "Peng Yu" <[email protected]>
> To: "Hiram Clawson" <[email protected]>
> Cc: [email protected]
> Sent: Tuesday, June 15, 2010 5:24:01 PM GMT -08:00 Tijuana / Baja California
> Subject: Re: [Genome] parallel blat
>
> I'm not sure what you described although I thought I understood.
>
> Suppose I have 10 million short sequences to be aligned to the human
> genome. It is making sense to split the 10 million sequences in 10
> files (each 1 million). Then I run 10 blat commands simultaneously.
> Each blat command will load all the chromosomes. Are you suggesting to
> break all_human_chromosomes.list into a number of smaller lists?
>
> blat -t=dna -q=dna -tileSize=11 -stepSize=5
> all_human_chromosomes.list short_seq0.fa short_seq0.psl
> ...
> blat -t=dna -q=dna -tileSize=11 -stepSize=5
> all_human_chromosomes.list short_seq9.fa short_seq9.psl
>
>
> On Tue, Jun 15, 2010 at 7:14 PM, Hiram Clawson <[email protected]> wrote:
>> No, this is not what I describe.  Only the tiny portion of the
>> target genome is loaded and the tiny portion of the query genome
>> is loaded.  Nothing is duplicated between processes.  We regularly
>> do this with genomes here and can get perhaps 100,000 processes
>> running on a 1,000 CPU core super computer and get the complete
>> genome to genome alignment done in a few hours.  This is much
>> more simple and efficient than trying to write a complicated
>> parallel functional program that would be difficult to operate
>> in a variety of operating systems.  The operating system
>> itself is optimized to manage the separate threads of the
>> individual processes that it manages.  We don't have to
>> duplicate that complication.
>
> --
> Regards,
> Peng
>



-- 
Regards,
Peng

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to