Hello,

The version of BLAT on our web site is slightly different from the one 
available for free download. It is recommended to double check that you 
are using masked genomic if you want to directly compare to web BLAT. I 
think this is the major cause of the differing number of alignments. 
This is why:

I ran the query against the human genome (the one I am assuming you are 
using) using web BLAT and got the same six hits as a result.

The best hit chr17:55,925,487-55,925,588 is significantly better than 
the others. From bases 26 to125 the match is identical with exception of 
two gaps around bases 60-70. This best match location does not have 
overlap with any repetitive track annotations or human chained self 
alignments. It is in the intron region of two transcripts in the UCSC 
Genes track (high quality from RefSeq).

The second best match: chr19:58,193,657-58,193,721 represents less 
sequence, bases 57 to126, but at a fairly high identity. This makes me 
think the alignment if finding a region of (partial) gene duplication 
since it also has some overlap with human chained self alignments. It is 
in the intron region of a transcript in the UCSC Genes track (the gene 
is annotated as non-coding).

The third (bases 29-64) and fourth (bases 58-82) alignments cover at 
slightly weaker identity. No gene association. No repeat association.

The fifth (bases 28-51) and sixth (bases 80-102) alignments cover at 
identities as good or better than the third and fourth alignment, but 
both are in genomic regions that overlap with very weak LINE repeat 
elements (Repeat Masker tracks). No gene association.

Based on this data, my initial guess is that you are using an unmasked 
version of the genome for BLAT. The web version uses masked genomic.  
The bases represented by the third-sixth alignments may be capturing 
repeat hits. In addition, it seems very likely that the small fragment 
at the beginning of the sequence (absent from all web BLAT alignments: 
bases 1-26 or so) is capturing repetitive hits, since if I scroll along 
the genomic to the right of the first alignment (because the alignment 
is to the negative strand) where these 26 bases "should" align, up come 
very strong repeat masker matches to the genomic. I also looked 
downstream to the right and there are more strong repeat matches. Your 
sequence falls in a small window between two strong SINE elements 
annotated by repeat masker.

I hope this helps!
Jennifer Jackson
UCSC Genome Bioinformatics Group

Mittal, Vinay K wrote:
> Hi,
> I was running local blat on my mac osx server for the following sequence:
>   
>> seq
>>     
> TAGTAGTAGACTAATAACAATAGTAAGAAAAATATTGTTTAATGTATGTATACAATTTATATGGTTTCACAAATCTGTTTAGATGTGTGTTTCAGGCAATTTCTTATTAAAGTTTTTGCTACCTTA
>
> I used the same command line as mentioned on FAQ page to duplicate web BLAT.
>
> >From the local blat I got 86 hits on human genome while BLAT on UCSC genome 
> >website shows only 6 hits for the same.
> Does web based BLAT (on UCSC website) filter out hits based on some criteria??
>
> Thanks.
>
> Sincerely,
> V.
>
>
>   
_______________________________________________
Genome maillist  -  [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to