Re: [Genome] Question about BLAT: The case which BLAT will miss true hits.

jp d Wed, 20 May 2009 09:12:14 -0700

hi,
the sequence is a repeat region
and repeat masked by default.
starting gfServer with -repMatch=1000000
 got me about 1000 hits for that sequence.




--- On Wed, 5/20/09, Chi, Sung Wook <[email protected]> wrote:

> From: Chi, Sung Wook <[email protected]>
> Subject: [Genome] Question about BLAT: The case which BLAT will miss true 
> hits.
> To: [email protected]
> Date: Wednesday, May 20, 2009, 8:12 AM
> Dear UCSC genome browser,
> 
> Hi. My name is Sung Wook Chi,a PhD student in Rockefeller
> Univ.
> I've been using your BLAT program for mapping the
> high-throughput sequence 
> data from SOLEXA/Illumina on genome without anyproblem.
> But some people claimed about using BLAT instead of using
> BLASTN by mentioning 
> that BLAT may miss some match on genome.
> 
> So I'd like to ask your opinion and thought about this
> issue and attached the 
> complain. I would greatly appreciate you if you tell me
> about you thought 
> about this.
> 
> My point was that BLAT will miss true hits that most likely
> matter. What the 
> authors, and many people in the community, do not realize
> is that BLAT will 
> miss even exact matches and will miss them by a mile. Let
> me highlight this 
> last statement with the following 33 nucleotide sequence:
>     GAGCCACCATGTGGTTGCTGGGAATTGAACTCA
> The authors can verify, by going to www.ensembl.org and
> running BLAT with 
> default settings on the mouse genome that BLAT will find no
> hits whatsoever. 
> Now, if the authors select BLASTN with the default
> "Near-exact matches" they 
> will find a handful of hits in the current chromosome
> assemblies. If they 
> replace "Near-exact matches" with "Allow some local
> mismatch" and re-run 
> BLASTN, the number of hits that they will find will
> increase. Finally, 
> with "Allow some local mismatch" selected, they can hit
> 'Configure' and un-
> check the 'RepeatMasker' filter, then rerun: this will take
> a while to 
> complete; upon completion, BLASTN will report "20447
> alignments". Note that 
> even this number is an undercount: the above 33mer has
> nearly 8,000 exact, 
> full-length copies in the mouse genome, and an additional
> 24,000 copies with a 
> single-letter mismatch for a total of more than 32,000
> copies with at most one 
> mismatch anywhere along the length of the 33-mer. This
> means that BLAT (and 
> BLASTN)
> underestimate the number of hits even if the allowed error
> is very small (my 
> experience is that independent of method used,
> high-throughput sequencing data 
> have substantially more than one base error in 33
> nucleotides of sequence). 
> Why does this point matter? Because BLAT's inherent
> inability will mean that 
> people cannot properly place all of their reads on the
> genome all of the time, 
> which will turn affect the computed key estimates of false
> positives and false 
> negatives for their method.
> 
> Thank you very much for you and your BLAT program.
> 
> Sincerely,
> 
> Chi, Sung Wook
> 
> 
> 
> Chi, Sung Wook
> -------------------------------------------------
> PhD Student, Tri-institutional Program in Computational
> Biology and Medicine.
> Laboratory of Molecular Neuro-oncology,
> The Rockefeller University.
> 1230 York Ave., Box 226
> New York, NY 10021
> 
> Tel(lab): 212-327-7461
> E-mail: [email protected]
>        
> 
> 
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
> 

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] Question about BLAT: The case which BLAT will miss true hits.

Reply via email to